MobileNet Review

3 min readDec 24, 2019

MobileNet V1

Mobile Nets can work with a lot of tasks, including object detection, fine grain classification, face attributes and large-scale geo-localization. It was designed for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth wise separable convolutions to build light weight deep neural networks.

The authors introduce two simple global hyperparameters that efficiently tradeoff between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem.

Network Architecture

MoblileNet V2

Mobile Net V2 builds upon the ideas from MobileNet V1 [1], using depth wise separable convolution as efficient building blocks. However, V2 introduces two new features to the architecture: 1) linear bottlenecks between layers, and 2) short connections between bottlenecks.

MobileNet V3

MobileNet V3 is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm and subsequently improved through novel architecture Advantages.

· MobileNetV3-Large

o Target high resource usage

o 3.2% more accurate on ImageNet classification while reducing latency by 20% compared to MobileNet V2.

o MobileNet V3-Large detection is over 25% faster at roughly the same accuracy as MobleNet V2 on COCO detection.

· MobileNetV3-Small

o Target low resource usage

o 6.6% more accurate compared to a MobileNetV2 model with comparable latency.

Mobile Net V3 Definitions

Semantic Segmentation: LR-ASPP (Lite Reduced Atreus Spatial Pyramid)

R-ASSP is a reduced design of the Astrous Spatial Pyramid Pooling module, which adopts only two branches consisting of a 1 x 1 convolution and global-average pooling operation.

Lite R-ASPP (LR-ASPP), improving over R-ASPP, deploys the global-average pooling in a fashion similar to the Squeeze-and-Excitation module, in which the author employ a large pooling kernel with a large stride (to save some computation) and only one 1 x 1 convolution in the module. The author apply atrous convolution to the last block of MobileNet V3 to extract denser features, and further add a skip connection from low-level features to capture more detailed information.

Is a new efficient segmentation decoder, it achieved new state of the art results for mobile classification, detection and segmentation.

o MobleNetV3-Large LR-ASPP is 34% faster than MobileNetV2 R-ASPP at similar accuracy for Cityscapes segmentation