Releases: roboflow/inference
v1.2.2
What's Changed
- reduce scaledown time to 3 minutes to try to reduce costs of unused containers by @rafel-roboflow in #2205
- Add Event Writer docstrings and industrial section to block docs by @rvirani1 in #2214
- Add support for local stream processing to webrtc sdk by @grzegorz-roboflow in #2200
- Use serverless usage check for auth by @hansent in #2215
- [codex] Fix GLM OCR token forwarding by @hansent in #2216
- Add sigmoid smoothing to instance segmentation post-processing of YOLO models family in inference-models by @PawelPeczek-Roboflow in #2217
- Fix missing libnvdla_compiler.so in Jetson 6.x TRT runtime by @alexnorell in #2201
- Pass
x-roboflow-internal-service-sceretheader to calls for new model registry by @PawelPeczek-Roboflow in #2219 - Add CUDA health checking to /healthz endpoint by @hansent in #2204
- Aggregate remote cold start data in workflow headers by @hansent in #2209
- Revert "Aggregate remote cold start data in workflow headers" by @PawelPeczek-Roboflow in #2222
- Trackers blocks support all detections kinds by @leeclemnet in #2221
- Fix bug with MacOS build by @PawelPeczek-Roboflow in #2223
Full Changelog: v1.2.1...v1.2.2
v1.2.1
What's Changed
- Add change to inform caller about model loading failures due to environment constrains violated by @PawelPeczek-Roboflow in #2180
- Add Introspection of Locally Cached Models by @yeldarby in #2161
- Pass ROBOFLOW_ENVIRONMENT env to modal based on PROJECT by @grzegorz-roboflow in #2179
- Fix missing sidebar nav for workflow blocks in Zensical docs by @Erol444 in #2182
- Add Workflow Profiling docs by @Erol444 in #2178
- Fix botocore.docs missing in jp71 image by @alexnorell in #2184
- fix(detection_event_log): use frame_timestamp for accurate absolute timestamps by @jeku46 in #2150
- Fix/windows build by @PawelPeczek-Roboflow in #2188
- Add CUDA memory leak profiling script by @hansent in #2193
- Bug/dg 314 show nicer errors webrtc by @rafel-roboflow in #2103
- Add Roboflow Vision Events workflow block by @rvirani1 in #2192
- Add routing metadata in model registry (and GPU VRAM tracking and reporting for loaded models) by @hansent in #2183
- [codex] Fix request metadata tracking for nested model-manager decorators by @hansent in #2198
- feat(sam3): add class_mapping support to SAM3 v3 block by @felipe-tomino in #2196
- Add memory backend for model monitoring cache by @hansent in #2194
- Add llms.txt and per-page markdown for LLM-friendly docs by @Erol444 in #2185
- Add nvidia-l40s as a valid TRT compilation target in
inference-cliby @mkaic in #2121 - Bump version to
1.2.1by @PawelPeczek-Roboflow in #2199 - Support yololite with fused NMS by @leeclemnet in #2203
- test(sam3): add tests for class_mapping feature in SAM3 v3 block by @felipe-tomino in #2202
- Remove enterprise-only flag from Vision Events block by @rvirani1 in #2207
New Contributors
- @felipe-tomino made their first contribution in #2196
Full Changelog: v1.2.0...v1.2.1
v1.2.0
🚀 Added
🚗 Switched to inference-models as default inference engine
As announced at the beginning of the 1.x.y release series, we've been working to make inference-models the default engine — and it's now live. The old inference backend remains available in opt-out mode.
Along with this change (and related updates to torch handling), we've updated the recommended installation flow for the inference-gpu Python package. Install torch and torchvision first — selecting the variant and CUDA index that matches your environment — then install inference-gpu:
pip install --index-url https://download.pytorch.org/whl/cu128 torch torchvision # adjust CUDA version as needed
pip install inference-gpuAdditionally, since inference-models depends on pycuda, you'll need CUDA installed with the development toolkit (including headers required to build pycuda). Follow the appropriate installation guide for your platform:
Tip
To continue using the old inference backend, set the environment variable USE_INFERENCE_MODELS=False.
Important
inference-models manages its cache differently from the old backend. To enable automatic model eviction in long-running containers, activate the Cache Watchdog — it monitors disk usage and removes files when storage exceeds the configured threshold.
Set MAX_INFERENCE_MODELS_CACHE_SIZE_MB to enable it. You can also control how often it runs with INFERENCE_MODELS_CACHE_WATCHDOG_INTERVAL_MINUTES. We recommend enabling this only if there's a risk of running out of disk space on your server.
🛤️ trackers 🤝 Workflows
The new Roboflow open-source library - trackers just got onboarded to workflows.
Thanks to @leeclemnet (#2130) we have three new blocks:
| New Block | Type Slug | Algorithm |
|---|---|---|
| bytetrack/v1.py | roboflow_core/trackers_bytetrack@v1 | ByteTrack |
| sort/v1.py | roboflow_core/trackers_sort@v1 | SORT |
| ocsort/v1.py | roboflow_core/trackers_ocsort@v1 | OC-SORT |
🔥 New Workflows blocks
- GLM-OCR model now has Workflows coverage - after adding the model to
inference-modelslast week, @Erol444 this week made a contribution to Workflows 💪 - @jeku46 in #2171 added structured Event Write block to the pool of Enterprise plugins
Workflows Community plugins
Check out our new documentation page - with Workflows Community plugins highlighting community work around Workflows ecosystem.
🔧 Fixed
- Add in-process LRU cache for model metadata lookups by @hansent in #2142
- Pin OTel packages to fix CPU Docker build resolution by @hansent in #2146
- Fix CI by letting tests regarding inference-gpu installation to run on machine which actually have required system libraries by @PawelPeczek-Roboflow in #2147
- Add ability to restrict max input resolution for rfdetr model by @PawelPeczek-Roboflow in #2145
- Fix/ci after switch to inference models by @PawelPeczek-Roboflow in #2149
- [CS-146] Fix issue with multi-label classification adapter returning wrong values by @dkosowski87 in #2157
- Fix issue with non-exsiting test asset image by @PawelPeczek-Roboflow in #2168
- Fix/clean webrtc worker shutdown on arm by @grzegorz-roboflow in #2169
- Fix keypoints stored as object-dtype arrays breaking supervision indexing by @grzegorz-roboflow in #2170
- Add orjson serialization to outputs by @PawelPeczek-Roboflow in #2165
- Add change making Roboflow Weights Provider (in inference-models) to respect license server proxy by @PawelPeczek-Roboflow in #2172
🚧 Maintenance
- Add OpenTelemetry tracing and metrics by @hansent in #2141
- Add change to make inference-models default backend by @PawelPeczek-Roboflow in #2144
- Add special handling for local API key by @yeldarby in #2153
- Bump inference-models version and clarify docs regarding installation by @PawelPeczek-Roboflow in #2162
- Add community plugins page by @PawelPeczek-Roboflow in #2163
- Cut a release by @PawelPeczek-Roboflow in #2175
- Move batch processing docs to docs.roboflow.com by @Erol444 in #2167
- Allow keypoints in velocity block by @grzegorz-roboflow in #2155
- Add detailed OTel inference sub-spans and X-Trace-Id response header by @hansent in #2148
- Do not send usage from modal only when webrtc connection could not have been established by @grzegorz-roboflow in #2173
- Add change to add job name to batch processing jobs by @PawelPeczek-Roboflow in #2143
Full Changelog: v1.1.2...v1.2.0
v1.1.2
What's Changed
- Add ability to override pre-processing per model forward-pass in
inference-modelsby @PawelPeczek-Roboflow in #2120 - Fix video billing for rtsp by @digaobarbosa in #2117
- Add option to use CUDA graphs with TRT for RF-DETR Object Detection in
inference_modelsby @mkaic in #1938 - feat: Add Redis-based workspace stream quota for WebRTC sessions by @rafel-roboflow in #2025
- Add implementation for inference-models cache watchdog in old inference by @PawelPeczek-Roboflow in #2122
- Create Semantic Segmentation Model workflow block by @leeclemnet in #2028
- Fix SAM3 visual segment concurrency race (DATAMAN-183) by @digaobarbosa in #2124
- Fix workflow spec fetch to fall back to file cache on timeout by @alexnorell in #2127
- GLM-OCR support by @Erol444 in #2126
- Add S3 Sink workflow block by @Jonathan-Roboflow in #2129
- Add detections area property extraction and UQL property_name_options by @dkosowski87 in #2132
- added gpt 5.4 mini/nano models to workflow blocks by @Erol444 in #2125
- Add dkosowski87 to other repo elements as codeowner by @dkosowski87 in #2135
- added reference overview page, updated inference py package about page by @Erol444 in #2119
- Feature/extend glm ocr by @PawelPeczek-Roboflow in #2133
- GLM-OCR docs by @Erol444 in #2134
- Revert change with making inference models default backend by @PawelPeczek-Roboflow in #2136
New Contributors
- @Jonathan-Roboflow made their first contribution in #2129
Full Changelog: v1.1.1...v1.1.2
v1.1.1
🚀 Added
🌀 Execution Engine v1.8.0
Steps gated by control flow (e.g. after a ContinueIf block) can now run even when they have no data-derived lineage — meaning they don't receive batch-oriented inputs from upstream steps. Lineage and execution dimensionality are now derived from control flow predecessor steps. Existing workflows are unaffected.
- 🔀 Control flow lineage — The compiler now tracks lineage coming from control flow steps (e.g. branches after
ContinueIf). When a step has no batch-oriented data inputs but is preceded by control flow steps, its execution slices and batch structure are taken from those control flow predecessors. - 🔓 Loosened compatibility check — Previously, steps with control flow predecessors but no data-derived lineage would fail at compile time with
ControlFlowDefinitionError. That check is now relaxed: lineage is derived from control flow predecessors when no input data lineage exists. The strict check still runs when the step does have data-derived lineage. - ✨ New step patterns — Steps triggered only by control flow that don't consume batch data now compile and run correctly. For example, you can send email notifications or run other side-effect steps after a
ContinueIfwithout wiring any data into parameters likemessage_parameters— the step will execute once per control flow branch. - 🐛
Batch.remove_by_indicesnested batch fix (breaking) — When removing indices viaBatch.remove_by_indices, nestedBatchelements are now recursively filtered by the same index set. Previously, only the top-level batch was filtered while nested batches were left unchanged, which could cause downstream blocks to silently processNonevalues or fail outright.
Please review our change log 🥼 which outlines all introduced changes. PR: #2106 by @dkosowski87
Warning
One breaking change is included due to a bug fix in Batch.remove_by_indices with nested batches (see below) — impact is expected to be minimal.
🚧 Maintenance
- fix version for exe/dmg download by @Erol444 in #2104
- Qwen3.5 Block Safety by @Matvezy in #2111
- Fix ONNX batch size limit fast path for fixed-batch models by @grzegorz-roboflow in #2112
- Redirect versionless models auth and metadata to new model registry when
USE_INFERENCE_MODELS=Trueby @PawelPeczek-Roboflow in #2105 - feat: expose stream/pipeline metrics via Prometheus /metrics endpoint by @alexnorell in #2097
- Fix aliases mixup between rfdetr-xlarge and rfdetr-2xlarge by @Erol444 in #2054
- metadata on upload to dataset block by @digaobarbosa in #2064
- Qwen3 5 docs by @Erol444 in #2114
- Update execution engine version in tests to 1.8.0 by @dkosowski87 in #2115
Full Changelog: v1.1.0...v1.1.1
v1.1.0
ℹ️ About 1.1.0 release
This inference release brings important changes to the ecosystem:
- We have deprecated Python 3.9 which reached EOL
- We have not made
inference-modelsthe default backend for running predictions - this change is postponed until version1.2.0.
🚀 Added
🧠 Qwen3.5
Thanks to @Matvezy, inference now supports the new Qwen3.5 model.
Qwen3.5 is Alibaba's latest open-source model family (released Feb 2026), ranging from 0.8B to 397B parameters. The headline features are native multimodal (text + vision) support. inference and Workflows support small 0.8B parameters version.
Model is available only with inference-models backend - released in inference-models 0.20.0
🪄 GPT-5.4 support
Thanks to @Erol444, the LLM Workflows block now supports GPT-5.4, keeping inference current with the latest OpenAI model lineup.
⚙️ Selectable inference backend for batch processing
Following up on inferemce 1.0.0 release, Roboflow clients can now select which inference backend is used for batch processing — giving more fine-grained control when mixing legacy and new engine workloads.
Using inference-cli, one can specify which models backend will be selected inference-models or old-inference.
inference rf-cloud batch-processing process-images-with-workflow \
--workflow-id <your-workflow> \
--batch-id <your-batch> \
--api-key<your-api-key> \
--inference-backend inference-models
# or - for videos
inference rf-cloud batch-processing process-videos-with-workflow \
--workflow-id <your-workflow> \
--batch-id <your-batch> \
--api-key<your-api-key> \
--inference-backend inference-modelsThe same can be configured in Roboflow App and via HTTP integration - check out swagger
Caution
Currently, the default backend is old-inference, but that will change in the nearest future - Roboflow clients should verify new backend and make necessary adjustments in their integrations if they want to still use old-inference backend.
🦺 Maintanence
🐍 Drop of Python 3.9 and upgrade to transformers>=5
We've ported all public builds to work with versions of Python newer than 3.9, which was slowing us down when it comes to onboarding new features. Thanks to deprecation, we could migrate to transformers>=5 and enable new model - Qwen 3.5.
Other changes
- Fix theme build by @Erol444 in #2093
- fix docs.yml to correctly build css by @Erol444 in #2095
- fix: pinned models no longer block LRU eviction by @hansent in #2091
- Add support for pushing back to client HTTP 402 errors by @PawelPeczek-Roboflow in #2099
- Add env flag to globally disable selected inference-models backends by @PawelPeczek-Roboflow in #2096
- Added RF-DETR by @Erol444 in #2098
- Qwen3 5 improvements by @Matvezy in #2101
- Fix OpenCV ffmpeg/gstreamer support in JP51 core image by @alexnorell in #2100
- Release/1.1.0 by @PawelPeczek-Roboflow in #2102
Full Changelog: v1.0.5...v1.1.0
v1.0.5
What's Changed
- feat: add model cold start, model/workflow/workspace ID response headers by @hansent in #2052
- Support yololite object detection in inference_models with ONNX backend by @leeclemnet in #2078
- Fix the input parameter types accepted by the Ethernet IP PLC block for PLC reads/writes by @shntu in #2061
- Try to address TRT issue by @PawelPeczek-Roboflow in #2079
- Release new inference-models by @PawelPeczek-Roboflow in #2084
- Email message serialization fix by @dkosowski87 in #2083
- Support New Roboflow API Usage Paused Error 423 by @maxschridde1494 in #2082
- Expose /healthz and /readiness endpoints even if API_KEY is not set by @ecarrara in #2077
- feat: inference_models adapters respect countinference for credit verification bypass by @hansent in #2081
- Update docs by @Erol444 in #2076
- Reduce flash-attn MAX_JOBS to 1 for JP7.1 build by @alexnorell in #2068
- Bump the npm_and_yarn group across 2 directories with 10 updates by @dependabot[bot] in #2085
- Fix shared model cache race conditions causing pod crashes by @hansent in #2080
- fix inference-models pypi publishing by @grzegorz-roboflow in #2086
- Qwen3 5 and move to transformers 5 by @Matvezy in #2070
- Correct resize procedure for RF-DETR models trained on versions with non-stretch, non-square resize by @mkaic in #2067
- ENT-969: Add TestPatternStreamProducer as a built-in video source type by @NVergunst-ROBO in #2056
- fix: handle expired Redis lock release gracefully by @rafel-roboflow in #2060
- Revert/qwen 3.5 by @PawelPeczek-Roboflow in #2087
- feat: gate structured access logging behind STRUCTURED_API_LOGGING env var by @hansent in #2088
- Deploy inference-models-0.19.5 by @PawelPeczek-Roboflow in #2089
New Contributors
- @maxschridde1494 made their first contribution in #2082
- @ecarrara made their first contribution in #2077
Full Changelog: v1.0.4...v1.0.5
v1.0.4
What's Changed
- Update sam3_3d tdfy commit to latest main by @leeclemnet in #2050
- skip /usage/plan request when api key is not provided by @rafel-roboflow in #2059
- Fix issue with rfdetr-segmentation class remapping by @PawelPeczek-Roboflow in #2075
Full Changelog: v1.0.3...v1.0.4
v1.0.3
What's Changed
- Fix JP7.1 container build OOM during ORT compilation by @alexnorell in #2065
- Add AV codec dependencies to base image by @shntu in #2039
- Fix: Updated aiohttp to >=3.13.3 to address CVEs (#1949) by @thchann in #2069
- Add upper-bound constraints for aiohttp by @PawelPeczek-Roboflow in #2071
- Change the ranking priority for AutoLoader - ONNX packages over Torch by @PawelPeczek-Roboflow in #2047
- Loosening typing-extensions dependency by @PawelPeczek-Roboflow in #2072
- Prepare inference
1.0.3release by @PawelPeczek-Roboflow in #2073
New Contributors
Full Changelog: v1.0.2...v1.0.3
v1.0.2
What's Changed
- Add single-tenant workflow cache mode and thread
workflow_version_idacross the stack by @alexnorell in #2031 - Add JetPack 7.1 container build workflow and CLI support by @alexnorell in #2032
- Fix: Set task_type for SegmentAnything3_3D_Objects by @leeclemnet in #2030
- feat(workflows): support custom image names in dataset upload block by @rafel-roboflow in #2034
- Expose inference configuration flags for sam3-3d by @leeclemnet in #2040
- feat(sam3): enable SDK-based remote execution for SAM3 workflow blocks by @hansent in #2042
- Add examples/sam-3d notebooks by @leeclemnet in #2043
- Add per-request 100ms duration floor via internal execution header by @hansent in #2037
- bugfix: fix version field in polygon and halo v2 visualization block manifests by @lrosemberg in #2044
- Fix large weights cdn download issue by @Matvezy in #2046
- Fix torch.compile for sam3-3d by @leeclemnet in #2041
- Fix overlapping parameter in inference-cli by @PawelPeczek-Roboflow in #2038
- Bug/dg 306 wrong workflow that doesnt raise error and provokes 500 by @rafel-roboflow in #2036
- Add output from mask measurement block to label visualization by @jeku46 in #2035
- feat: add PINNED_MODELS and PRELOAD_API_KEY for preload on serverless by @hansent in #2048
- Bump version to 1.0.2 by @PawelPeczek-Roboflow in #2051
Full Changelog: v1.0.1...v1.0.2
