Skip to content

Data Formulator 0.7-alpha

Latest

Choose a tag to compare

@Chenglong-MS Chenglong-MS released this 03 Mar 02:11
· 7 commits to main since this release
afdbe10

More Charts, New Experience, Enterprise-Ready

🚧 *This version is in fact a big redesign, probably deserves v1.0. But for now, we're shipping this as 0.7-alpha for fun --- a proper, detailed write-up on the new architecture is coming soon. *

Data Formulator 0.7.0 Alpha

Version: 0.7.0a1 (alpha) · Files changed: ~282 · +84k / −16k lines


What's New

📊 Dramatically Expanded Visualization Support

The chart template system has been rebuilt with a new semantic engine, expanding from ~15 chart types to 30 Vega-Lite chart types:

Chart types overview
Category Chart Types
Scatter & Point Scatter Plot, Regression, Boxplot, Strip Plot (new), Ranged Dot Plot
Bar Bar Chart, Grouped Bar Chart, Stacked Bar Chart, Histogram, Lollipop Chart (new), Pyramid Chart, Heatmap
Line & Area Line Chart, Dotted Line Chart, Bump Chart (new), Area Chart (new), Streamgraph (new)
Part-to-Whole Pie Chart (new), Rose Chart (new), Waterfall Chart (new)
Statistical Density Plot (new), Candlestick Chart (new), Radar Chart (new)
Map US Map (new), World Map (new)
Custom Custom Point, Custom Line, Custom Bar, Custom Rect, Custom Area

Semantic field analysis automatically infers temporal, categorical, quantitative, and geographic types to recommend the right chart for the data.

💬 Hybrid Chat + Data Thread & Enhanced Agent Mode

  • Redesigned Data Thread — Chat-based interaction is woven directly into the exploration thread. Users converse with agents inline alongside data transformations and chart results, replacing the separate chat panel.
  • Richer thread cards showing transformation lineage, chart previews, and agent reasoning in a unified timeline.
  • New agent mode — Agents autonomously plan multi-step explorations, generate chart recommendations, and produce data insights, all surfaced inline in the thread.
  • Conversational data loading via integrated chat-based data ingestion.

🤖 Redesigned Agent Architecture

The backend agent system has been significantly restructured — consolidating previously fragmented agents into a cleaner, more capable design:

  • Unified DataAgent replaces four separate agents (agent_py_concept_derive, agent_py_data_rec, agent_sql_data_rec, agent_sql_data_transform) with a single agent that handles both Python and SQL data transformations.
  • New agent_data_transform — Dedicated data transformation agent.
  • New agent_data_rec — Recommendation agent that suggests charts and exploration directions.
  • New agent_chart_insight — Generates natural-language insights from chart results.
  • Shared semantic_types — Type system used by both backend agents and frontend chart engine for consistent field inference.

🏗️ Workspace / Data Lake Architecture (Enterprise-Ready)

A new persistent, identity-based Workspace layer replaces the previous in-memory DB approach:

  • Workspace manages per-user directories with a workspace.yaml metadata catalog tracking every table's lineage, schema, provenance, and source type.
  • Uploaded files (CSV, Excel, JSON, etc.) preserved as-is; data-loader sources stored as Parquet via PyArrow.
  • CacheManager and FileManager for efficient caching and file lifecycle.
  • Azure Blob and Cached Azure Blob workspace backends for cloud deployments.
  • WorkspaceFactory selects the correct workspace backend from configuration.
  • New modular route layer replaces monolithic app routes.

🔒 Security Hardening

  • Code signing for AI-generated Python code.
  • Sandboxed execution with local and docker backends.
  • Authentication layer for user identity.
  • Flask rate limiting to protect API endpoints.

📦 Other Notable Changes

  • UV-first build — Fully reproducible builds with uv.lock; uv sync + uv run data_formulator is now the recommended development workflow.
  • Unified data upload dialog and refresh data dialog.
  • Demo streaming routes for live data scenarios.
  • api-keys.env.template consolidated into .env.template.

Getting Started

# Recommended (uv)
uvx data_formulator

# Or via pip
pip install data_formulator==0.7.0a1
python -m data_formulator

Community Contributions

Thanks to our contributors:


Alpha notice: This is a pre-release. APIs and features may change before the stable 0.7.0 release. Please report issues and share feedback!

Full Changelog: 0.6...0.7.0a1