Model Zoo

YoloV5 Nano
YoloV4-tiny
Mobile Object Localizer
Mask R-CNN
MegaDepth
YOLOP
FastDepth
ready-to-use, open source models

YoloV5 Nano

Real-time Object detection with YoloV5n pre-trained on COCO data set.

Resolution
416x416x3

Task type
detection

FPS
/

YoloV4-tiny

Real-time Object detection with YoloV4-tiny pre-trained on COCO data set.

Resolution
416x416x3

Task type
detection

FPS
39.8

Mobile Object Localizer

A class-agnostic mobile object detector.

Resolution
192x192x3

Task type
detection

FPS
41.02

Mask R-CNN

Instance Segmentation with Mask R-CNN pre-trained on COCO data set.

Resolution
300x300x3

Task type
instance_segmentation

FPS
3.11

MegaDepth

Estimate depth from a RGB image.

Resolution
256x192x3

Task type
monocular_depth_estimation

FPS
4.8
Learn more

YOLOP

You Only Look at Once for Panoptic driving Perception.

Resolution
320x320x3

Task type
detection

FPS
11.4
Learn more

FastDepth

Estimate depth from RGB images using FastDepth from MIT.

Resolution
320x256x3

Task type
monocular_depth_estimation

FPS
40.32

person-reidentification-retail-0031

This is a person reidentification model for a general scenario. It uses a whole body image as an input and outputs an embedding vector to match a pair of images by the cosine distance. The model is based on the RMNet backbone developed for fast inference. A single reidentification head from the 1/16 scale feature map outputs an embedding vector of 256 floats.

Resolution
48x96x3

Task type
named_entity_recognition

FPS
60

YoloV4

Object detection with YoloV4 pre-trained on COCO data set.

Resolution
608x608x3

Task type
detection

FPS
1.31

YoloV3

Object detection with YoloV3 pre-trained on COCO data set.

Resolution
416x416x3

Task type
detection

FPS
4.16

DM-Count

Count dense or sparse crowds using density maps.

Resolution
960x540x3

Task type
feature_extraction

FPS
0.22
Learn more

FastDepth

Estimate depth from RGB images using FastDepth from MIT.

Resolution
640x480x3

Task type
monocular_depth_estimation

FPS
???

Facial landmarks 68 detection

Detect 68 facial landmarks.

Resolution
160x160x3

Task type
head_pose_estimation

FPS
0

Depth MobileNetV2

Estimate depth from a RGB image.

Resolution
320x240x3

Task type
monocular_depth_estimation

FPS
20.03

HR-Depth

Depth estimation from RGB image using HR-Depth model.

Resolution
256x192x3

Task type
monocular_depth_estimation

FPS
4.8
Learn more

Image Quality Assesment Classification

Image quality assessment from RGB image using EdgeSegNet-Classifier.

Resolution
256x256x3

Task type
classification

FPS
13.65

SC-Depth

Depth estimation from a RGB image using SC-Depth model.

Resolution
512x256x3

Task type
monocular_depth_estimation

FPS
12.89
Learn more

EAST

Detect text on images using EAST model.

Resolution
256x256x3

Task type
detection

FPS
22.5

Depth MobileNetV2

Depth Estimation of a given input image.

Resolution
640x480x3

Task type
monocular_depth_estimation

FPS
/

MicroNet-M0

Image classification with MicroNet-M0 pretrained on ImageNet.

Resolution
224x224x3

Task type
classification

FPS
34.36

InceptionV4

Image classification with InceptionV4 pretrained on ImageNet.

Resolution
299x299x3

Task type
classification

FPS
8.02

ShuffleNetV2

Image classification with ShuffleNetV2 pretrained on ImageNet.

Resolution
224x224x3

Task type
classification

FPS
151.72

GhostNet

Image classification with GhostNet pretrained on ImageNet.

Resolution
256x320x3

Task type
classification

FPS
53.13

YoloV5 Nano

No Depth
Description

Real-time Object detection with YoloV5n pre-trained on COCO data set.

Input
  
    Name: inputs
    Type: image
    Shape: 3, 416, 416
  
            

FPS
  Mean: /
  STD: /
            

License
GNU GPL-3

YoloV4-tiny

No Depth
Description

Real-time Object detection with YoloV4-tiny pre-trained on COCO data set.

Input
  
    Name: inputs
    Type: image
    Shape: 3, 416, 416
  
            

FPS
  Mean: 39.8
  STD: 1.33
            

License
YOLO

Mobile Object Localizer

No Depth
Description

A class-agnostic mobile object detector.

Input
  
    Name: normalized_input_image_tensor
    Type: image
    Shape: 3, 192, 192
  
            

FPS
  Mean: 41.02
  STD: 0.01
            

License
Apache-2.0

Mask R-CNN

No Depth
Description

Instance Segmentation with Mask R-CNN pre-trained on COCO data set.

Input
  
    Name: 960
    Type: image
    Shape: 3, 300, 300
  
            

FPS
  Mean: 3.11
  STD: 0.01
            

License
Apache-2.0

MegaDepth

No Depth
Description

Estimate depth from a RGB image.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 192, 256
  
            

FPS
  Mean: 4.8
  STD: 0.01
            

License
MIT

YOLOP

No Depth
Description

You Only Look at Once for Panoptic driving Perception.

Input
  
    Name: images
    Type: image
    Shape: 3, 320, 320
  
            

FPS
  Mean: 11.4
  STD: 0.01
            

License
MIT

FastDepth

No Depth
Description

Estimate depth from RGB images using FastDepth from MIT.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 256, 320
  
            

FPS
  Mean: 40.32
  STD: 0.14
            

License
MIT

person-reidentification-retail-0031

No Depth
Description

This is a person reidentification model for a general scenario. It uses a whole body image as an input and outputs an embedding vector to match a pair of images by the cosine distance. The model is based on the RMNet backbone developed for fast inference. A single reidentification head from the 1/16 scale feature map outputs an embedding vector of 256 floats.

Input
  
    Name: data
    Type: image
    Shape: 3, 96, 48
  
            

FPS
  Mean: 60
  STD: 0
            

License
Apache

YoloV4

No Depth
Description

Object detection with YoloV4 pre-trained on COCO data set.

Input
  
    Name: inputs
    Type: image
    Shape: 3, 608, 608
  
            

FPS
  Mean: 1.31
  STD: 0.01
            

License
YOLO

YoloV3

No Depth
Description

Object detection with YoloV3 pre-trained on COCO data set.

Input
  
    Name: inputs
    Type: image
    Shape: 3, 416, 416
  
            

FPS
  Mean: 4.16
  STD: 0.14
            

License
YOLO

DM-Count

No Depth
Description

Count dense or sparse crowds using density maps.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 540, 960
  
            

FPS
  Mean: 0.22
  STD: 0.01
            

License
MIT

FastDepth

No Depth
Description

Estimate depth from RGB images using FastDepth from MIT.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 480, 640
  
            

FPS
  Mean: ???
  STD: ???
            

License
MIT

Facial landmarks 68 detection

No Depth
Description

Detect 68 facial landmarks.

Input
  
    Name: images
    Type: image
    Shape: 3, 160, 160
  
            

FPS
  Mean: 0
  STD: 0
            

License
Apache-2.0 License

Depth MobileNetV2

No Depth
Description

Estimate depth from a RGB image.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 240, 320
  
            

FPS
  Mean: 20.03
  STD: 0.7
            

License
MIT

HR-Depth

No Depth
Description

Depth estimation from RGB image using HR-Depth model.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 192, 256
  
            

FPS
  Mean: 4.8
  STD: 0.01
            

License
MIT

Image Quality Assesment Classification

No Depth
Description

Image quality assessment from RGB image using EdgeSegNet-Classifier.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 256, 256
  
            

FPS
  Mean: 13.65
  STD: 2.73
            

License
MIT

SC-Depth

No Depth
Description

Depth estimation from a RGB image using SC-Depth model.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 256, 512
  
            

FPS
  Mean: 12.89
  STD: 0.33
            

License
GPL-3.0

EAST

No Depth
Description

Detect text on images using EAST model.

Input
  
    Name: input_images
    Type: image
    Shape: 3, 256, 256
  
            

FPS
  Mean: 22.5
  STD: 0.22
            

License
GPL-3.0

Depth MobileNetV2

No Depth
Description

Depth Estimation of a given input image.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 480, 640
  
            

FPS
  Mean: /
  STD: /
            

License
MIT

MicroNet-M0

No Depth
Description

Image classification with MicroNet-M0 pretrained on ImageNet.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 224, 224
  
            

FPS
  Mean: 34.36
  STD: 0.04
            

License
MIT

InceptionV4

No Depth
Description

Image classification with InceptionV4 pretrained on ImageNet.

Input
  
    Name: input
    Type: image
    Shape: 3, 299, 299
  
            

FPS
  Mean: 8.02
  STD: 0.03
            

License
Apache-2.0

ShuffleNetV2

No Depth
Description

Image classification with ShuffleNetV2 pretrained on ImageNet.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 224, 224
  
            

FPS
  Mean: 151.72
  STD: 2.55
            

License
BSD-3

GhostNet

No Depth
Description

Image classification with GhostNet pretrained on ImageNet.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 320, 256
  
            

FPS
  Mean: 53.13
  STD: 0.92
            

License
None