MobileNet V1
Mobile Nets can work with a lot of tasks, including object detection, fine grain classification, face attributes and large-scale geo-localization. It was designed for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth wise separable convolutions to build light weight deep neural networks.
The authors introduce two simple global hyperparameters that efficiently tradeoff between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem.
Network Architecture
MoblileNet V2
Mobile Net V2 builds upon the ideas from MobileNet V1 [1], using depth wise separable convolution as efficient building blocks. However, V2 introduces two new features to the architecture: 1) linear bottlenecks between layers, and 2) short connections between bottlenecks.
MobileNet V3
MobileNet V3 is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm and subsequently improved through novel architecture Advantages.
· MobileNetV3-Large
o Target high resource usage
o 3.2% more accurate on ImageNet classification while reducing latency by 20% compared to MobileNet V2.
o MobileNet V3-Large detection is over 25% faster at roughly the same accuracy as MobleNet V2 on COCO detection.
· MobileNetV3-Small
o Target low resource usage
o 6.6% more accurate compared to a MobileNetV2 model with comparable latency.
Mobile Net V3 Definitions
Semantic Segmentation: LR-ASPP (Lite Reduced Atreus Spatial Pyramid)
R-ASSP is a reduced design of the Astrous Spatial Pyramid Pooling module, which adopts only two branches consisting of a 1 x 1 convolution and global-average pooling operation.
Lite R-ASPP (LR-ASPP), improving over R-ASPP, deploys the global-average pooling in a fashion similar to the Squeeze-and-Excitation module, in which the author employ a large pooling kernel with a large stride (to save some computation) and only one 1 x 1 convolution in the module. The author apply atrous convolution to the last block of MobileNet V3 to extract denser features, and further add a skip connection from low-level features to capture more detailed information.
Is a new efficient segmentation decoder, it achieved new state of the art results for mobile classification, detection and segmentation.
o MobleNetV3-Large LR-ASPP is 34% faster than MobileNetV2 R-ASPP at similar accuracy for Cityscapes segmentation
Performance
References
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
https://arxiv.org/abs/1704.04861
Review: MobileNetV1 — Depthwise Separable Convolution (Light Weight Model)
MobileNetV2: Inverted Residuals and Linear Bottlenecks
https://arxiv.org/abs/1801.04381
MobileNetV2: The Next Generation of On-Device Computer Vision Networks
https://ai.googleblog.com/2018/04/mobilenetv2-next-generation-of-on.html
https://www.infoq.cn/article/hJrpveTTROF4YafNHWtD
https://arxiv.org/abs/1905.02244
https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet