Pytorch實作系列 — ShuffleNet v1 & v2

6 min readAug 17, 2024

--

Zhang et al. 在 (2017, 曠視科技)提出 ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices ，特色是利用channel shuffle和group convolution減輕計算量的同時，混和通道間的資訊，這點與Squeeze and excitation類似。

網路概念

原先通道間的資訊在分組後是互相獨立的，例如紅色分組無法整合綠色分組的資訊，於是將通道打散重新組合，便能讓不同通道資訊在各組間繼續傳遞。

channel shuffle in paper[https://arxiv.org/abs/1707.01083]

V2進一步探討MAC(memory access cost)和FLOPs(float point operations per second)的可優化之處，並總結出四點。

輸入與輸出通道寬度相等可最小化MAC。

MAC reaches lower bound when equal input and output channel width.

2. 過多的group convolution會增加MAC。

MAC has a linear relationship with #group for fixed input shape and output channel width.

3. 網路碎片化會削弱平行化。如 Inception。

4. 元素運算的MAC/FLOPs比較高，因此在輕量網路是不可忽略的。

網路結構

結構與ResNet相仿，將1x1 convolution 改為 groupwise，3x3 convolution 改為 depthwise，減輕參數量。結合處改為concat，減輕bottleneck計算量，也使兩側通道數不須相等，只是實作上通道數不直覺。

Channel shuffle的位置在v1緊接第一個1x1 group convolution，在v2改為block後，且在stride=1的開頭加入channel split。

(a), (b) 是v1的區塊, (c), (d) 是v2的區塊

v1 architecture in paper[https://arxiv.org/abs/1707.01083]

v2 architecture in paper[https://arxiv.org/abs/1807.11164v1]

資料集

GTSRB，一個有兩萬六千多張照片的德國交通號誌辨識資料集，共43種標記。

評估

v1 測試準確度 75%，模型大小 3.8MB。(groups=3, scale_factor=1)

v2 測試準確度 81.56%，模型大小 5.12MB。(groups=4, scale_factor=1)

v1 confusion matrix

v2 confusion matrix

筆記

參數設計上須不停注意是否可被group數量整除
v1 在 group= {2,4,8}, scale_factor={0.25, 0.5, 1.5} 的參數組合無法實現，根據v2的大小，應該要花時間湊到最近的公因數 XDD
v2的附錄有更多可參考內容，包括增大版和SE版
torchvision僅收錄v2，沒有v1
過去在arduino放過CNN，在極小的空間內ShuffleNet v2的性價比最高，MobileNet v2過大，SqueezeNet的運算單元 tflite 支援不足

實作

GitHub - gitE0Z9/pytorch-implementations: Deep learning models implemented in PyTorch

Deep learning models implemented in PyTorch. Contribute to gitE0Z9/pytorch-implementations development by creating an…

github.com

參考

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for…

arxiv.org

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

Currently, the neural network architecture design is mostly guided by the \emph{indirect} metric of computation…

arxiv.org

Computer Vision

Written by mz bai

Math is math, math is universal Code is code, code is horrifying unique.

No responses yet

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams