Inference Estimatordetect v2.4.1 · COCO val2017

14 lines of Python.
Production-grade CV.

Select your deployment target. Every number below is reproducible in one Colab click.

Model Size

Input Resolution

Target Device

305fps

FPS

740MB

Memory

44644.6%

mAP@50

$ detect.run(model="small", imgsz=640, device="gpu") ▋

Run the Benchmark Yourself View on GitHub

Evidence below

§ 1 — Accuracy vs. Speed

Top-right quadrant. Every time.

GPU inference at 640px input on a single A10G. Detect dots cluster where speed and accuracy peak simultaneously — the same chart, every dataset.

Detect

Competitors

† All measurements: NVIDIA A10G, CUDA 12.2, PyTorch 2.2, batch size 1. Reproduce →

§ 2 — Inference Race

Same frame. Different finish times.

Processing 120 consecutive COCO val frames. Each bar advances as frames complete. Detect finishes before competitors hit frame 80.

READY

0.00s

detect (small)

0/120 frames0ms

YOLOv8s

0/120 frames0ms

Detectron2-R50

0/120 frames0ms

RT-DETR-L

0/120 frames0ms

EfficientDet-D3

0/120 frames0ms

§ 3 — Ecosystem Directory

Your stack, already supported.

12 first-class integrations. ONNX to TensorRT to Coral Edge TPU — one export command, zero glue code.

ONNX

ONNX Runtime

Export

stable

Export any Detect model to ONNX in one line. Deploy to any ONNX-compatible runtime.

12k GitHub stars

TRT

TensorRT

Acceleration

stable

FP16/INT8 quantization for NVIDIA GPUs. 2–3× throughput boost over raw PyTorch.

8k GitHub stars

CML

CoreML

Apple Silicon

stable

Native Neural Engine inference on M1/M2/M3 Macs and iOS devices.

OpenVINO

Intel

stable

Optimized inference on Intel CPUs, iGPUs, and VPUs. Ideal for industrial edge.

TFL

TFLite

Mobile

stable

Android and embedded deployment. Supports delegation to GPU and Edge TPU.

TRI

Triton Inference Server

Serving

stable

Multi-model, multi-GPU serving with dynamic batching and gRPC/HTTP APIs.

RoboFlow

Data

stable

One-click dataset import from Roboflow Universe. 200k+ labeled CV datasets.

Label Studio

Annotation

beta

Active learning loop: run Detect predictions, correct in Label Studio, retrain.

Weights & Biases

MLOps

stable

Automatic experiment tracking, model versioning, and benchmark dashboards.

RPI

Raspberry Pi

Edge

stable

Nano model runs at 28 FPS on Pi 5. Detect ships a Pi-optimized ONNX export.

COR

Coral Edge TPU

Edge

stable

INT8 quantized nano model: 290 FPS on a $25 USB Coral accelerator.

Hugging Face Hub

Model Registry

beta

All official Detect checkpoints on HF Hub. One-line download with versioning.

View full integration docs →

§ 4 — The 14 Lines

The entire API. Not a tutorial excerpt.

This is complete working code — detection, segmentation, pose estimation, and streaming. No boilerplate hidden below the fold.

detect_birds.py

1	import detect
2
3	# Load a model — nano to large
4	model = detect.load("small")
5
6	# Run on any source: file, URL, webcam, RTSP
7	results = model.predict(
8	source="./birds.mp4",
9	conf=0.25,
10	device="cuda",
11	stream=True
12	)
13
14	# Every result: boxes, masks, keypoints, classes
15	for frame in results:
16	frame.show() # annotated + timed

Lines to production

Not a toy demo — real inference pipeline

35+

Export formats

ONNX, TensorRT, CoreML, TFLite, OpenVINO

Configuration files

No YAML required. Sensible defaults ship with the package

1-click

Colab reproduce

Every benchmark chart on this page, live

Run the Benchmark Yourself

14 lines of Python.Production-grade CV.

Top-right quadrant. Every time.

Same frame. Different finish times.

Your stack, already supported.

The entire API. Not a tutorial excerpt.

14 lines of Python.
Production-grade CV.