Oxen is a lightning fast data version control system for large datasets. We aim to make versioning data as easy as versioning code.
The interface mirrors git, but shines in many areas that git or git-lfs fall short. Oxen is built from the ground up for any data type, and is optimized to handle repositories with millions of files and scales to terrabytes of data.
oxen init
oxen add images/
oxen add annotations/*.parquet
oxen commit "Adding 200k images and their corresponding annotations"
oxen push origin mainOxen is comprised of a command line interface, as well as bindings for Rust 🦀, Python 🐍, and HTTP interfaces 🌎 to make it easy to integrate into your workflow.
Oxen is designed to efficiently manage large data in any format - including images, audio, video, text or tabular data like parquet files with millions of rows. Behind the scenes Oxen can store any blob type, but has specialized metadata extractors for certain filetypes and caches this information in the merkle tree for fast access later.
One of the main reasons datasets are hard to maintain is the pure performance of indexing the data and transferring the data over the network. We wanted to be able to index hundreds of thousands of images, videos, audio files, and text files in seconds.
Watch below as we version hundreds of thousands of images in seconds 🔥
But speed is only the beginning.
Oxen is built around ergonomics, ease of use, and it is easy to learn. If you know how to use git, you know how to use Oxen.
- 🔥 Fast (efficient indexing and syncing of data)
- 🧠 Easy to learn (same commands as git)
- 💪 Handles large files (images, videos, audio, text, parquet, arrow, json, models, etc)
- 🗄️ Index lots of files (millions of images? no problem)
- 📊 Native DataFrame processing (index, compare and serve up DataFrames)
- 📈 Tracks changes over time (never worry about losing the state of your data)
- 🤝 Collaborate with your team (sync to an oxen-server)
- 🌎 Workspaces to interact with the data without downloading it
- 👀 Better data visualization on OxenHub
To learn what everything Oxen can do, the full documentation can be found at https://docs.oxen.ai.
You can install through homebrew or pip or from our releases page.
Install via Homebrew:
brew install oxenpip install oxenaiClone your first Oxen repository from the OxenHub.
oxen clone https://hub.oxen.ai/ox/CatDogBBoxIf you have any questions, comments, suggestions, or just want to get in contact with the team, feel free to email us at hello@oxen.ai
This repository contains the Python library that wraps the core Rust codebase. We would love help extending out the python interfaces, the documentation, or the core rust library.
Code bases to contribute to:
If you are building anything with Oxen.ai or have any questions we would love to hear from you in our discord.
Each codebase has its own build instructions, please refer to the Rust build instructions
and oxen-python's build instructions for specifics.
However, each codebase shares the same pre-requisites and pre-commit hooks.
You should use bin/install-prereqs to automatically install the required development tools and toolchains for Rust and Python. Execute that as:
bin/install-prereqsIt supports MacOS and Debian-based Linux distributions. If you have a different OS or distribution, or if you have some error with the install script, you can follow the manual installation steps below.
Oxen is purely written in Rust 🦀. You should install the Rust toolchain with rustup.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | shOnce you have rust, install the following developer tools:
bacon: run the server with reload-on-changescargo-machete: identify and remove unused dependenciescargo-llvm-cov: calculate test code coveragecargo-sort: ensureCargo.tomlfiles are organizedcargo-nextest: run unit tests
You can install all of these at once with the following commands:
cargo install bacon cargo-machete cargo-llvm-cov cargo-sort
cargo install --locked cargo-nextestMake sure cmake is installed. cmake can be installed on macOS with:
brew install cmakeThe Python interface uses liboxen bindings provided by PyO3.
The oxen-python codebase requires installing uv:
curl --LsSf https://astral.sh/uv/install.sh | shIf you use mise to manage your Python installs, you may run into an error where the oxen-py crate can't find the Python dynamic library to link with, e.g., dyld[31558]: Library not loaded: @rpath/libpython3.13.dylib
You can fix it by adding this to your mise config (~/.config/mise/config.toml)
[env]
DYLD_LIBRARY_PATH = "{{ exec(command='mise where python') }}/lib"We use pre-commit-hooks to check for commit consistency.
Install with uv as a tool:
uv tool install pre-commitInstall Oxen's pre-commit hooks locally using:
pre-commit installFor deployment, build with the production feature flag and --release:
cargo build --workspace --release --features productionThis enables:
- OpenTelemetry tracing (
otel) -- export spans to any OTLP-compatible collector (Jaeger, Tempo, Datadog, etc.). See OpenTelemetry Tracing for runtime configuration. - FFmpeg thumbnails (
ffmpeg) -- generate video/image thumbnails via FFmpeg (requires FFmpeg libraries installed on the host). - Performance logging (
perf-logging) -- additional timing instrumentation for internal operations.
Without --features production, the default build excludes OTel dependencies and FFmpeg support, keeping the binary smaller for local development.
Oxen uses structured logging via the tracing crate. All log output goes to stderr by default in a human-readable format. This applies to the CLI (oxen), the server (oxen-server), and any code using liboxen (including the Python bindings).
Set the RUST_LOG environment variable to control verbosity.
# Show debug logs from the oxen library
RUST_LOG=debug oxen push origin main
# Show only warnings and errors
RUST_LOG=warn oxen-server start
# Fine-grained: debug for liboxen, warn for everything else
RUST_LOG=warn,liboxen=debug oxen-server startSet OXEN_LOG_DIR to enable file-based logging in addition to stderr. This env var is a directory where rotating log files are written. Log files are written as newline-delimited JSON (one JSON object per line), rotated daily. Each line includes the timestamp, level, target, thread ID, source file, and line number.
OXEN_LOG_DIR=./logs/ RUST_LOG=warn oxen clone https://hub.oxen.ai/ox/CatDogBBox
OXEN_LOG_DIR=/var/log/oxen oxen-server startLog files are named {app_name}.{date} (e.g. oxen-server.2026-04-06) inside the configured directory.
To ingest these logs with standard tooling:
- Promtail / Grafana Loki -- point a
file_sdor static target at the log directory; Loki handles newline-delimited JSON natively. - Filebeat / Elasticsearch -- configure a
filebeat.inputsentry withtype: filestreamandparsers: [{ ndjson: {} }]. - Vector -- use a
filesource withdecoding.codec = "json". jq-- for ad-hoc inspection:
# Stream logs, filter for errors
tail -f ~/.oxen/logs/oxen-server.2026-04-06 | jq 'select(.level == "ERROR")'oxen-server exposes a Prometheus-compatible metrics endpoint.
See Prometheus Metrics for details.
oxen-server can export tracing spans to any OTLP-compatible collector (Jaeger, Tempo, etc.).
Requires building with the otel feature flag.
See OpenTelemetry Tracing for details.
Span lifecycle events can be emitted as log lines on stderr for lightweight tracing. See FmtSpan Events for details.
Oxen was build by a team of machine learning engineers, who have spent countless hours in their careers managing datasets. We have used many different tools, but none of them were as easy to use and as ergonomic as we would like.
If you have ever tried git lfs to version large datasets and became frustrated, we feel your pain. Solutions like git-lfs are too slow when it comes to the scale of data we need for machine learning.
If you have ever uploaded a large dataset of images, audio, video, or text to a cloud storage bucket with the name:
s3://data/images_july_2022_final_2_no_really_final.tar.gz
We built Oxen to be the tool we wish we had.
"Oxen" 🐂 comes from the fact that the tooling will plow, maintain, and version your data like a good farmer tends to their fields 🌾. Let Oxen take care of the grunt work of your infrastructure so you can focus on the higher-level problems that matter to your product.
