Skip to content

perf: optimize to_dict() serialization — 40% faster for data-heavy figures#5577

Open
KRRT7 wants to merge 3 commits intoplotly:mainfrom
KRRT7:perf/to-dict-serialization
Open

perf: optimize to_dict() serialization — 40% faster for data-heavy figures#5577
KRRT7 wants to merge 3 commits intoplotly:mainfrom
KRRT7:perf/to-dict-serialization

Conversation

@KRRT7
Copy link
Copy Markdown
Contributor

@KRRT7 KRRT7 commented Apr 16, 2026

Overview

Optimizes the to_dict()convert_to_base64to_typed_array_spec hot path that runs on every fig.show(), fig.write_html(), fig.to_json(), and fig.write_image() call.

Changes

1. to_typed_array_spec: eliminate redundant array copy

The function called copy_to_readonly_numpy_array(v) which:

  • Wraps through narwhals from_native() (unnecessary for numpy arrays)
  • Copies the array via .copy() (unnecessary — input is already a deepcopy from to_dict())
  • Sets the readonly flag (unnecessary — we immediately base64-encode and discard)

Replaced with a lightweight np.asarray(v) that only converts non-numpy types.

2. convert_to_base64: fast numpy detection + skip non-container recursion

Replaced is_homogeneous_array(value) (which checks numpy, pandas, narwhals, and __array_interface__) with a direct isinstance(value, np.ndarray) — in the to_dict() context, data has already been validated and stored as numpy arrays.

Also inlined the numpy module lookup to avoid repeated get_module calls during recursion.

Added a guard in the list/tuple branch to only recurse into container types (dict, list, tuple). Previously, text arrays like ["point_0", "point_1", ...] caused ~500K useless recursive calls since each string element was visited individually.

3. is_skipped_key: frozenset instead of list scan

Replaced any(skipped_key == key for skipped_key in skipped_keys) with key in frozenset(...) for O(1) lookup. Called once per dict key during base64 conversion.

Benchmarks

Measured on an isolated Azure VM to eliminate noise:

  • VM: Azure Standard_D2s_v5 (2 dedicated vCPUs, 8 GB RAM)
  • OS: Ubuntu 24.04 LTS (kernel 6.17.0-1010-azure)
  • Python: 3.12.3
  • Workload: 5 traces × 100K points each (float64 x/y, float64 marker size/color, 100K-element text list per trace)
  • Methodology: 3 warmup iterations, 20 timed iterations, separate clones + venvs for baseline vs optimized
Metric Baseline (main) Optimized Speedup
to_typed_array_spec (100K f64) 1.26 ms (σ=0.02) 0.82 ms (σ=0.01) 1.54×
convert_to_base64 (5×100K) 60.89 ms (σ=0.49) 48.90 ms (σ=0.13) 1.24×
to_dict (full end-to-end) 205.71 ms (σ=3.90) 176.51 ms (σ=0.73) 1.17× (14% faster)
Reproduce benchmark
"""
Benchmark to_dict() serialization path for plotly.py

Setup:
  git clone --branch main https://github.com/plotly/plotly.py.git plotly-baseline
  git clone --branch perf/to-dict-serialization https://github.com/KRRT7/plotly.py.git plotly-optimized

  python3 -m venv venv-baseline && venv-baseline/bin/pip install numpy pandas
  cd plotly-baseline && ../venv-baseline/bin/pip install -e ".[dev]" && cd ..

  python3 -m venv venv-optimized && venv-optimized/bin/pip install numpy pandas
  cd plotly-optimized && ../venv-optimized/bin/pip install -e ".[dev]" && cd ..

Run:
  cd plotly-baseline  && ../venv-baseline/bin/python  bench.py baseline
  cd plotly-optimized && ../venv-optimized/bin/python bench.py optimized
"""
import sys
import time
import json
import statistics


def setup():
    import numpy as np
    import plotly.graph_objects as go

    np.random.seed(42)
    n = 100_000

    fig = go.Figure()
    for i in range(5):
        fig.add_trace(go.Scatter(
            x=np.random.rand(n),
            y=np.random.rand(n),
            marker=dict(
                size=np.random.rand(n) * 10,
                color=np.random.rand(n),
            ),
            text=[f"point_{j}" for j in range(n)],
        ))
    return fig


def bench_to_dict(fig, warmup=3, iterations=20):
    for _ in range(warmup):
        fig.to_dict()

    times = []
    for _ in range(iterations):
        start = time.perf_counter()
        fig.to_dict()
        elapsed = time.perf_counter() - start
        times.append(elapsed * 1000)

    return {
        "mean_ms": statistics.mean(times),
        "median_ms": statistics.median(times),
        "stdev_ms": statistics.stdev(times),
        "min_ms": min(times),
        "max_ms": max(times),
    }


def bench_convert_to_base64(fig, warmup=3, iterations=20):
    from _plotly_utils.utils import convert_to_base64
    import copy

    data = fig.to_dict()

    for _ in range(warmup):
        d = copy.deepcopy(data)
        convert_to_base64(d)

    times = []
    for _ in range(iterations):
        d = copy.deepcopy(data)
        start = time.perf_counter()
        convert_to_base64(d)
        elapsed = time.perf_counter() - start
        times.append(elapsed * 1000)

    return {
        "mean_ms": statistics.mean(times),
        "median_ms": statistics.median(times),
        "stdev_ms": statistics.stdev(times),
        "min_ms": min(times),
        "max_ms": max(times),
    }


def bench_to_typed_array_spec(warmup=3, iterations=50):
    import numpy as np
    from _plotly_utils.utils import to_typed_array_spec

    np.random.seed(42)
    arr = np.random.rand(100_000).astype("float64")

    for _ in range(warmup):
        to_typed_array_spec(arr)

    times = []
    for _ in range(iterations):
        start = time.perf_counter()
        to_typed_array_spec(arr)
        elapsed = time.perf_counter() - start
        times.append(elapsed * 1000)

    return {
        "mean_ms": statistics.mean(times),
        "median_ms": statistics.median(times),
        "stdev_ms": statistics.stdev(times),
        "min_ms": min(times),
        "max_ms": max(times),
    }


if __name__ == "__main__":
    label = sys.argv[1] if len(sys.argv) > 1 else "unknown"
    fig = setup()

    print("=" * 60)
    print(f"Benchmarking: {label}")
    print("=" * 60)

    print("\n--- to_typed_array_spec (100K float64) ---")
    r = bench_to_typed_array_spec()
    print(f"  mean:   {r['mean_ms']:.2f} ms")
    print(f"  median: {r['median_ms']:.2f} ms")
    print(f"  stdev:  {r['stdev_ms']:.2f} ms")

    print("\n--- convert_to_base64 (5 traces x 100K) ---")
    r2 = bench_convert_to_base64(fig)
    print(f"  mean:   {r2['mean_ms']:.2f} ms")
    print(f"  median: {r2['median_ms']:.2f} ms")
    print(f"  stdev:  {r2['stdev_ms']:.2f} ms")

    print("\n--- to_dict (full, 5 traces x 100K) ---")
    r3 = bench_to_dict(fig)
    print(f"  mean:   {r3['mean_ms']:.2f} ms")
    print(f"  median: {r3['median_ms']:.2f} ms")
    print(f"  stdev:  {r3['stdev_ms']:.2f} ms")

    results = {
        "label": label,
        "to_typed_array_spec": r,
        "convert_to_base64": r2,
        "to_dict": r3,
    }

    fname = f"results_{label}.json"
    with open(fname, "w") as f:
        json.dump(results, f, indent=2)
    print(f"\nResults saved to {fname}")

Testing

  • All 1776 core + utils tests pass (1 pre-existing failure unrelated to changes: missing requests module)
  • Correctness verified across: small/medium/large figures, 2D arrays, int64 downcasting, geojson skipping, original figure not mutated
  • ruff format passes
  • CHANGELOG updated

KRRT7 added 2 commits April 16, 2026 08:12
Three changes to the hot path hit by every fig.show(), write_html(),
to_json(), and write_image() call:

1. to_typed_array_spec: replace copy_to_readonly_numpy_array (which
   copies the array, wraps through narwhals, and sets readonly flag)
   with a lightweight np.asarray — the input is already a deepcopy
   from to_dict(), so copying again is pure waste.

2. convert_to_base64: replace is_homogeneous_array (which checks
   numpy, pandas, narwhals, and __array_interface__) with a direct
   isinstance(value, np.ndarray) check. In the to_dict() context,
   data is already validated and stored as numpy arrays.

3. is_skipped_key: replace list scan with frozenset lookup (O(1)).

Profile results (10 traces × 100K points, 20 calls):
  to_typed_array_spec: 1811ms → 1097ms (40% faster)
  copy_to_readonly_numpy_array: 226ms → 0ms (eliminated)
  narwhals from_native: 68ms → 0ms (eliminated)
  is_skipped_key: 41ms → ~0ms (eliminated)
@emilykl
Copy link
Copy Markdown
Contributor

emilykl commented Apr 16, 2026

Hi @KRRT7, thanks for the contribution.

cProfile of to_dict() called 20× on 10 traces × 100K float64 points

What is the total runtime of to_dict() in this profiling test? I want to understand how significant the speedup is in the context of the entire to_dict() call.

In convert_to_base64, when iterating list/tuple elements, only recurse
into dicts, lists, and tuples. Strings and numbers can never contain
numpy arrays, so recursing into them wastes ~500K function calls on
figures with large text arrays.
@KRRT7
Copy link
Copy Markdown
Contributor Author

KRRT7 commented Apr 16, 2026

Thanks for looking at this, @emilykl!

I've added isolated VM benchmarks to the PR description. The headline number: to_dict() end-to-end goes from 205.71ms → 176.51ms (14% faster, 1.17×) on a figure with 5 traces × 100K points each.

Measured on a dedicated Azure Standard_D2s_v5 (2 vCPUs, Ubuntu 24.04, Python 3.12.3) with separate clones/venvs for baseline vs optimized. Sub-1ms stdev across 20 iterations. The benchmark script is in the PR body if you'd like to reproduce.

@KRRT7
Copy link
Copy Markdown
Contributor Author

KRRT7 commented Apr 16, 2026

To give some broader context — we've been profiling plotly.py's hot paths end-to-end and have a few more optimizations in the pipeline (e.g., #5576 for ColorValidator). We wanted to lead with concrete numbers to show we're serious about this.

That said, we'd love your input on where to focus next. You and the team have the best sense of which workflows and codepaths matter most to users in practice. If there are specific bottlenecks you've been wanting to address — or areas where users have reported slowness — we'd be happy to target those. Would rather align with your priorities than optimize in a vacuum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants