Model Zoo

YoloV7 Tiny
YoloV5 Nano
YoloV6 Nano
YoloV4-tiny
Mask R-CNN
YoloV7
YOLOP
YoloV6 Tiny
MegaDepth
FastDepth
Mobile Object Localizer
YoloV6 Nano
ready-to-use, open source models

YoloV7 Tiny

Real-time Object detection with YoloV7-tiny pre-trained on COCO data set.

Resolution
416x416x3

Task type
detection

FPS
/

YoloV5 Nano

Real-time Object detection with YoloV5n pre-trained on COCO data set.

Resolution
416x416x3

Task type
detection

FPS
/

YoloV6 Nano

Real-time Object detection with YoloV6n pre-trained on COCO data set.

Resolution
640x640x3

Task type
detection

FPS
/

YoloV4-tiny

Real-time Object detection with YoloV4-tiny pre-trained on COCO data set.

Resolution
416x416x3

Task type
detection

FPS
39.8

Mask R-CNN

Instance Segmentation with Mask R-CNN pre-trained on COCO data set.

Resolution
300x300x3

Task type
instance_segmentation

FPS
3.11

YoloV7

Real-time Object detection with YoloV7 pre-trained on COCO data set.

Resolution
416x416x3

Task type
detection

FPS
/

YOLOP

You Only Look at Once for Panoptic driving Perception.

Resolution
320x320x3

Task type
detection

FPS
11.4
Learn more

YoloV6 Tiny

Real-time Object detection with YoloV6t pre-trained on COCO data set.

Resolution
416x416x3

Task type
detection

FPS
/

MegaDepth

Estimate depth from a RGB image.

Resolution
256x192x3

Task type
monocular_depth_estimation

FPS
4.8
Learn more

FastDepth

Estimate depth from RGB images using FastDepth from MIT.

Resolution
320x256x3

Task type
monocular_depth_estimation

FPS
40.32

Mobile Object Localizer

A class-agnostic mobile object detector.

Resolution
192x192x3

Task type
detection

FPS
41.02

YoloV6 Nano

Real-time Object detection with YoloV6n pre-trained on COCO data set.

Resolution
416x416x3

Task type
detection

FPS
/

Face recognition MobileFaceNet ArcFace

Deep face recognition net with MobileFaceNet backbone and Arcface loss <https://arxiv.org/abs/1801.07698>

Resolution
112x112x3

Task type
face_recognition

FPS
0

Face-mask-detection

Face mask detection model, trained by the notebook available at depthai-ml-training repository.

Resolution
300x300x3

Task type
detection

FPS
/

YoloV4

Object detection with YoloV4 pre-trained on COCO data set.

Resolution
608x608x3

Task type
detection

FPS
1.31

Facial landmarks 68 detection

Detect 68 facial landmarks.

Resolution
160x160x3

Task type
head_pose_estimation

FPS
0

Deeplab-V3+ person segmentation model

Deeplab-V3+ person segmentation model

Resolution
513x513x3

Task type
semantic_segmentation

FPS
0

WeChat's QR code detection model

WeChat's QR code detection model

Resolution
384x384x3

Task type
detection

FPS
0

YoloV3

Object detection with YoloV3 pre-trained on COCO data set.

Resolution
416x416x3

Task type
detection

FPS
4.16

SBD-mask classification

Face mask classification model

Resolution
224x224x3

Task type
classification

FPS
0

person-reidentification-retail-0031

This is a person reidentification model for a general scenario. It uses a whole body image as an input and outputs an embedding vector to match a pair of images by the cosine distance. The model is based on the RMNet backbone developed for fast inference. A single reidentification head from the 1/16 scale feature map outputs an embedding vector of 256 floats.

Resolution
48x96x3

Task type
named_entity_recognition

FPS
60

DM-Count

Count dense or sparse crowds using density maps.

Resolution
960x540x3

Task type
feature_extraction

FPS
0.22
Learn more

Deeplab-V3+ person segmentation model

Deeplab-V3+ person segmentation model

Resolution
256x256x3

Task type
semantic_segmentation

FPS
0

FastDepth

Estimate depth from RGB images using FastDepth from MIT.

Resolution
640x480x3

Task type
monocular_depth_estimation

FPS
???

YuNet face detection model

YuNet face detection model

Resolution
160x120x3

Task type
detection

FPS
0

EAST

Detect text on images using EAST model.

Resolution
256x256x3

Task type
detection

FPS
22.5

PaDiM-Wood

PaDiM anomaly detection model trained to detect anomalies in wood

Resolution
256x256x3

Task type
detection

FPS
9.62

MobileNetV2

Feature extractor with MobileNetV2 pretrained on ImageNet.

Resolution
224x224x3

Task type
feature_extraction

FPS
47.27

MicroNet-M0

Image classification with MicroNet-M0 pretrained on ImageNet.

Resolution
224x224x3

Task type
classification

FPS
34.36

SC-Depth

Depth estimation from a RGB image using SC-Depth model.

Resolution
512x256x3

Task type
monocular_depth_estimation

FPS
12.89
Learn more

Image Quality Assesment Classification

Image quality assessment from RGB image using EdgeSegNet-Classifier.

Resolution
256x256x3

Task type
classification

FPS
13.65

Depth MobileNetV2

Estimate depth from a RGB image.

Resolution
320x240x3

Task type
monocular_depth_estimation

FPS
20.03

InceptionV4

Image classification with InceptionV4 pretrained on ImageNet.

Resolution
299x299x3

Task type
classification

FPS
8.02

ShuffleNetV2

Image classification with ShuffleNetV2 pretrained on ImageNet.

Resolution
224x224x3

Task type
classification

FPS
151.72

HR-Depth

Depth estimation from RGB image using HR-Depth model.

Resolution
256x192x3

Task type
monocular_depth_estimation

FPS
4.8
Learn more

Mediapipe's Palm detection model

Mediapipe's Palm detection model

Resolution
128x128x3

Task type
detection

FPS
0

MediaPipe Facemesh - 468 facial landmarks

MediaPipe Facemesh model that provides 468 facial landmarks

Resolution
192x192x3

Task type
object_attributes

FPS
/

Depth MobileNetV2

Depth Estimation of a given input image.

Resolution
640x480x3

Task type
monocular_depth_estimation

FPS
/

GhostNet

Image classification with GhostNet pretrained on ImageNet.

Resolution
256x320x3

Task type
classification

FPS
53.13

YoloV7 Tiny

No Depth
Description

Real-time Object detection with YoloV7-tiny pre-trained on COCO data set.

Input
  
    Name: images
    Type: image
    Shape: 3, 416, 416
  
            

FPS
  Mean: /
  STD: /
            

License
GNU GPL-3

YoloV5 Nano

No Depth
Description

Real-time Object detection with YoloV5n pre-trained on COCO data set.

Input
  
    Name: inputs
    Type: image
    Shape: 3, 416, 416
  
            

FPS
  Mean: /
  STD: /
            

License
GNU GPL-3

YoloV6 Nano

No Depth
Description

Real-time Object detection with YoloV6n pre-trained on COCO data set.

Input
  
    Name: images
    Type: image
    Shape: 3, 640, 640
  
            

FPS
  Mean: /
  STD: /
            

License
GNU GPL-3

YoloV4-tiny

No Depth
Description

Real-time Object detection with YoloV4-tiny pre-trained on COCO data set.

Input
  
    Name: inputs
    Type: image
    Shape: 3, 416, 416
  
            

FPS
  Mean: 39.8
  STD: 1.33
            

License
YOLO

Mask R-CNN

No Depth
Description

Instance Segmentation with Mask R-CNN pre-trained on COCO data set.

Input
  
    Name: 960
    Type: image
    Shape: 3, 300, 300
  
            

FPS
  Mean: 3.11
  STD: 0.01
            

License
Apache-2.0

YoloV7

No Depth
Description

Real-time Object detection with YoloV7 pre-trained on COCO data set.

Input
  
    Name: images
    Type: image
    Shape: 3, 416, 416
  
            

FPS
  Mean: /
  STD: /
            

License
GNU GPL-3

YOLOP

No Depth
Description

You Only Look at Once for Panoptic driving Perception.

Input
  
    Name: images
    Type: image
    Shape: 3, 320, 320
  
            

FPS
  Mean: 11.4
  STD: 0.01
            

License
MIT

YoloV6 Tiny

No Depth
Description

Real-time Object detection with YoloV6t pre-trained on COCO data set.

Input
  
    Name: images
    Type: image
    Shape: 3, 416, 416
  
            

FPS
  Mean: /
  STD: /
            

License
GNU GPL-3

MegaDepth

No Depth
Description

Estimate depth from a RGB image.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 192, 256
  
            

FPS
  Mean: 4.8
  STD: 0.01
            

License
MIT

FastDepth

No Depth
Description

Estimate depth from RGB images using FastDepth from MIT.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 256, 320
  
            

FPS
  Mean: 40.32
  STD: 0.14
            

License
MIT

Mobile Object Localizer

No Depth
Description

A class-agnostic mobile object detector.

Input
  
    Name: normalized_input_image_tensor
    Type: image
    Shape: 3, 192, 192
  
            

FPS
  Mean: 41.02
  STD: 0.01
            

License
Apache-2.0

YoloV6 Nano

No Depth
Description

Real-time Object detection with YoloV6n pre-trained on COCO data set.

Input
  
    Name: images
    Type: image
    Shape: 3, 416, 416
  
            

FPS
  Mean: /
  STD: /
            

License
GNU GPL-3

Face recognition MobileFaceNet ArcFace

No Depth
Description

Deep face recognition net with MobileFaceNet backbone and Arcface loss <https://arxiv.org/abs/1801.07698>

Input
  
    Name: data
    Type: image
    Shape: 3, 112, 112
  
            

FPS
  Mean: 0
  STD: 0
            

License

Face-mask-detection

No Depth
Description

Face mask detection model, trained by the notebook available at depthai-ml-training repository.

Input
  
    Name: image_tensor
    Type: image
    Shape: 3, 300, 300
  
            

FPS
  Mean: /
  STD: /
            

License
None

YoloV4

No Depth
Description

Object detection with YoloV4 pre-trained on COCO data set.

Input
  
    Name: inputs
    Type: image
    Shape: 3, 608, 608
  
            

FPS
  Mean: 1.31
  STD: 0.01
            

License
YOLO

Facial landmarks 68 detection

No Depth
Description

Detect 68 facial landmarks.

Input
  
    Name: images
    Type: image
    Shape: 3, 160, 160
  
            

FPS
  Mean: 0
  STD: 0
            

License
Apache-2.0 License

Deeplab-V3+ person segmentation model

No Depth
Description

Deeplab-V3+ person segmentation model

Input
  
    Name: data
    Type: image
    Shape: 3, 513, 513
  
            

FPS
  Mean: 0
  STD: 0
            

License
MIT License

WeChat's QR code detection model

No Depth
Description

WeChat's QR code detection model

Input
  
    Name: input
    Type: image
    Shape: 3, 384, 384
  
            

FPS
  Mean: 0
  STD: 0
            

License
Apache License Version 2.0

YoloV3

No Depth
Description

Object detection with YoloV3 pre-trained on COCO data set.

Input
  
    Name: inputs
    Type: image
    Shape: 3, 416, 416
  
            

FPS
  Mean: 4.16
  STD: 0.14
            

License
YOLO

SBD-mask classification

No Depth
Description

Face mask classification model

Input
  
    Name: 32:0
    Type: image
    Shape: 3, 224, 224
  
            

FPS
  Mean: 0
  STD: 0
            

License
None

person-reidentification-retail-0031

No Depth
Description

This is a person reidentification model for a general scenario. It uses a whole body image as an input and outputs an embedding vector to match a pair of images by the cosine distance. The model is based on the RMNet backbone developed for fast inference. A single reidentification head from the 1/16 scale feature map outputs an embedding vector of 256 floats.

Input
  
    Name: data
    Type: image
    Shape: 3, 96, 48
  
            

FPS
  Mean: 60
  STD: 0
            

License
Apache

DM-Count

No Depth
Description

Count dense or sparse crowds using density maps.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 540, 960
  
            

FPS
  Mean: 0.22
  STD: 0.01
            

License
MIT

Deeplab-V3+ person segmentation model

No Depth
Description

Deeplab-V3+ person segmentation model

Input
  
    Name: data
    Type: image
    Shape: 3, 256, 256
  
            

FPS
  Mean: 0
  STD: 0
            

License
MIT License

FastDepth

No Depth
Description

Estimate depth from RGB images using FastDepth from MIT.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 480, 640
  
            

FPS
  Mean: ???
  STD: ???
            

License
MIT

YuNet face detection model

No Depth
Description

YuNet face detection model

Input
  
    Name: data
    Type: image
    Shape: 3, 120, 160
  
            

FPS
  Mean: 0
  STD: 0
            

License

EAST

No Depth
Description

Detect text on images using EAST model.

Input
  
    Name: input_images
    Type: image
    Shape: 3, 256, 256
  
            

FPS
  Mean: 22.5
  STD: 0.22
            

License
GPL-3.0

PaDiM-Wood

No Depth
Description

PaDiM anomaly detection model trained to detect anomalies in wood

Input
  
    Name: input.1
    Type: image
    Shape: 3, 256, 256
  
            

FPS
  Mean: 9.62
  STD: 0.03
            

License
Apache 2.0

MobileNetV2

No Depth
Description

Feature extractor with MobileNetV2 pretrained on ImageNet.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 224, 224
  
            

FPS
  Mean: 47.27
  STD: 0.03
            

License
BSD-3

MicroNet-M0

No Depth
Description

Image classification with MicroNet-M0 pretrained on ImageNet.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 224, 224
  
            

FPS
  Mean: 34.36
  STD: 0.04
            

License
MIT

SC-Depth

No Depth
Description

Depth estimation from a RGB image using SC-Depth model.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 256, 512
  
            

FPS
  Mean: 12.89
  STD: 0.33
            

License
GPL-3.0

Image Quality Assesment Classification

No Depth
Description

Image quality assessment from RGB image using EdgeSegNet-Classifier.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 256, 256
  
            

FPS
  Mean: 13.65
  STD: 2.73
            

License
MIT

Depth MobileNetV2

No Depth
Description

Estimate depth from a RGB image.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 240, 320
  
            

FPS
  Mean: 20.03
  STD: 0.7
            

License
MIT

InceptionV4

No Depth
Description

Image classification with InceptionV4 pretrained on ImageNet.

Input
  
    Name: input
    Type: image
    Shape: 3, 299, 299
  
            

FPS
  Mean: 8.02
  STD: 0.03
            

License
Apache-2.0

ShuffleNetV2

No Depth
Description

Image classification with ShuffleNetV2 pretrained on ImageNet.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 224, 224
  
            

FPS
  Mean: 151.72
  STD: 2.55
            

License
BSD-3

HR-Depth

No Depth
Description

Depth estimation from RGB image using HR-Depth model.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 192, 256
  
            

FPS
  Mean: 4.8
  STD: 0.01
            

License
MIT

Mediapipe's Palm detection model

No Depth
Description

Mediapipe's Palm detection model

Input
  
    Name: images
    Type: image
    Shape: 3, 128, 128
  
            

FPS
  Mean: 0
  STD: 0
            

License
Apache-2.0 License

MediaPipe Facemesh - 468 facial landmarks

No Depth
Description

MediaPipe Facemesh model that provides 468 facial landmarks

Input
  
    Name: images
    Type: image
    Shape: 3, 192, 192
  
            

FPS
  Mean: /
  STD: /
            

License
Apache License Version 2.0

Depth MobileNetV2

No Depth
Description

Depth Estimation of a given input image.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 480, 640
  
            

FPS
  Mean: /
  STD: /
            

License
MIT

GhostNet

No Depth
Description

Image classification with GhostNet pretrained on ImageNet.

Input
  
    Name: input.1
    Type: image
    Shape: 3, 320, 256
  
            

FPS
  Mean: 53.13
  STD: 0.92
            

License
None