[metrics] Support DDSketch in the parquet pipeline by mattmkim · Pull Request #6257 · quickwit-oss/quickwit

mattmkim · 2026-03-31T15:14:07Z

Description

This PR can be reviewed commit by commit.

This PR updates the parquet pipeline to process DDSketches. See https://datadoghq.atlassian.net/wiki/spaces/QKHS/pages/6291357728/DDSketch+in+Parquet for more information about the DDSketch spec.

How was this PR tested?

Describe how you tested this PR.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 727f085864

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

fulmicoton-dd · 2026-04-01T08:44:11Z

i cannot review this PR due to lack of context. I know what DDSketch are, but I do not know what they are used for in the context the metrics ingestion pipeline, why they are stored in different files, etc.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e7d6a90ae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ppknap · 2026-04-10T14:55:32Z

+
+            let keys_inner = keys_builder.values();
+            for &k in &dp.keys {
+                keys_inner.append_value(k);


Where do we store the length of keys/counts? I.e. how do we recover the number of elements per data_point from _inner arrays?

they are stored implicitly by Arrow - it stores an array of offsets, so you can get the length of row i with offsets[i+1] - offsets[i]

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 37f4298f66

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9a38c47b7c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dff8e08fa0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 40cfe5a158

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-14T23:06:43Z

        for metadata in &splits_metadata {
-            let insertable =
-                InsertableMetricsSplit::from_metadata(metadata, MetricsSplitState::Staged)
-                    .map_err(|e| MetastoreError::JsonSerializeError {
-                        struct_name: "MetricsSplitMetadata".to_string(),
-                        message: e.to_string(),
-                    })?;
+            let insertable = InsertableParquetSplit::from_metadata(metadata, SplitState::Staged)
+                .map_err(|err| MetastoreError::JsonSerializeError {


Reject split-kind mismatches during parquet staging

stage_parquet_splits_impl chooses the destination table from the RPC (kind) but never verifies that each ParquetSplitMetadata.kind matches that table. A mixed or misrouted request can therefore persist sketch metadata into metrics_splits (or vice versa), violating table invariants and causing downstream routing/listing logic that relies on split kind to query the wrong API/table. Add an explicit metadata.kind == kind check and fail fast on mismatch before building insert rows.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a35fdb8600

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3bbfb7ff26

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3bbfb7ff26

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d430881b0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-15T00:23:37Z

+        let index_uid = request.index_uid().clone();
+        let splits_metadata = request.deserialize_splits_metadata()?;
+


Enforce request index UID for staged sketch metadata

stage_sketch_splits deserializes and forwards splits_metadata_json without checking that each ParquetSplitMetadata.index_uid matches request.index_uid, so a caller can submit StageSketchSplitsRequest.index_uid = A while embedding metadata for B. In file-backed mode this stores rows under index A whose serialized metadata claims B, which breaks request-scoped invariants and can cause later list/publish responses to carry inconsistent index identities; this path should validate equality (or overwrite metadata index UID from the request) before staging.

Useful? React with 👍 / 👎.

rishabh · 2026-04-16T03:27:51Z

+
+/// Returns whether the given index ID uses the Parquet/DataFusion pipeline.
+pub fn is_parquet_pipeline_index(index_id: &str) -> bool {
+    is_metrics_index(index_id) || is_sketches_index(index_id)


I think this is fine for now, but we'd be adding every parquet pipeline here every time we add one.

Also, every index that starts with sketches- doesn't necessarily have to be a Parquet pipeline.

The "fix" seems to be creating something like a ParquetMapping, and that's like a whole thing, so we can get back to it later.

rishabh · 2026-04-16T15:19:13Z

+    update_timestamp TIMESTAMP NOT NULL DEFAULT (CURRENT_TIMESTAMP AT TIME ZONE 'UTC'),
+    publish_timestamp TIMESTAMP,
+
+    FOREIGN KEY(index_uid) REFERENCES indexes(index_uid)


Should we add a ON DELETE CASCADE here? If we ever need to delete this, we'd have to clean up the table ourselves first.

The metrics table also doesn't include this.

mattmkim force-pushed the matthew.kim/parquet-sketches branch from db0e0db to 86c034b Compare March 31, 2026 21:14

mattmkim changed the title ~~[draft] parquet ddsketch engine~~ [metrics] Support DDSketch in the parquet pipeline Mar 31, 2026

mattmkim marked this pull request as ready for review March 31, 2026 21:30

mattmkim force-pushed the matthew.kim/parquet-sketches branch from c3fc790 to 2261237 Compare March 31, 2026 21:38

chatgpt-codex-connector bot reviewed Mar 31, 2026

View reviewed changes

Comment thread quickwit/quickwit-indexing/src/actors/publisher.rs

Comment thread quickwit/quickwit-metastore/src/metastore/postgres/metastore.rs Outdated

Base automatically changed from matthew.kim/metrics-wide-schema to main April 6, 2026 19:48

mattmkim force-pushed the matthew.kim/parquet-sketches branch from 2261237 to 6e7d6a9 Compare April 7, 2026 18:50

chatgpt-codex-connector bot reviewed Apr 7, 2026

View reviewed changes

Comment thread quickwit/quickwit-metastore/src/metastore/postgres/metastore.rs Outdated

mattmkim force-pushed the matthew.kim/parquet-sketches branch from 6e7d6a9 to 7490fe3 Compare April 8, 2026 19:15

g-talbot mentioned this pull request Apr 9, 2026

feat: compute deterministic timeseries_id column at ingest #6286

Merged

3 tasks

ppknap reviewed Apr 10, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 14, 2026

View reviewed changes

Comment thread quickwit/quickwit-parquet-engine/src/storage/split_writer.rs Outdated

Comment thread quickwit/quickwit-parquet-engine/src/split/metadata.rs

Comment thread quickwit/quickwit-metastore/migrations/postgresql/28_create-sketch-splits.up.sql

chatgpt-codex-connector bot reviewed Apr 14, 2026

View reviewed changes

Comment thread quickwit/quickwit-parquet-engine/src/storage/writer.rs Outdated

Comment thread quickwit/quickwit-metastore/src/metastore/postgres/metastore.rs

chatgpt-codex-connector bot reviewed Apr 14, 2026

View reviewed changes

Comment thread quickwit/quickwit-metastore/src/metastore/postgres/metastore.rs Outdated

chatgpt-codex-connector bot reviewed Apr 14, 2026

View reviewed changes

Comment thread quickwit/quickwit-metastore/src/metastore/file_backed/file_backed_index/mod.rs

mattmkim force-pushed the matthew.kim/parquet-sketches branch from a35fdb8 to 3bbfb7f Compare April 14, 2026 23:27

chatgpt-codex-connector bot reviewed Apr 14, 2026

View reviewed changes

Comment thread quickwit/quickwit-metastore/src/metastore/file_backed/mod.rs Outdated

Comment thread quickwit/quickwit-metastore/src/metastore/postgres/parquet_model.rs

chatgpt-codex-connector bot reviewed Apr 14, 2026

View reviewed changes

Comment thread quickwit/quickwit-metastore/src/metastore/file_backed/mod.rs Outdated

chatgpt-codex-connector bot reviewed Apr 15, 2026

View reviewed changes

mattmkim force-pushed the matthew.kim/parquet-sketches branch from 1d43088 to ad84838 Compare April 15, 2026 05:27

mattmkim added 5 commits April 14, 2026 22:36

generalize metric splits to parquet splits

d2888d2

add sketch schema + processor

5712a06

sketch split support in metastore

c911f0b

wire sketch pipeline into indexing actors and control plane

a514964

drive by fix: pass sort fields to bloom filter config

0655391

mattmkim force-pushed the matthew.kim/parquet-sketches branch from ad84838 to 0655391 Compare April 15, 2026 05:37

rishabh approved these changes Apr 16, 2026

View reviewed changes

rishabh reviewed Apr 16, 2026

View reviewed changes

		let index_uid = request.index_uid().clone();
		let splits_metadata = request.deserialize_splits_metadata()?;

Conversation

mattmkim commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How was this PR tested?

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

fulmicoton-dd commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ppknap Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

mattmkim Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

rishabh Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

rishabh Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

mattmkim commented Mar 31, 2026 •

edited

Loading