perf: Implement physical execution of uncorrelated scalar subqueries by neilconway · Pull Request #21240 · apache/datafusion

neilconway · 2026-03-29T18:56:44Z

Which issue does this PR close?

Rationale for this change

Previously, DataFusion evaluated uncorrelated scalar subqueries by transforming them into joins. This has three shortcomings:

Scalar subqueries that return > 1 row were allowed, producing incorrect query results. Such queries should instead result in a runtime error.
Performance. Evaluating scalar subqueries as a join requires going through the join machinery. More importantly, it means that UDFs that have specialized handling of scalar inputs cannot use those code paths for scalar subqueries, which often results in significantly slower query execution (e.g., Improve performance of array_has #18181). It also makes filter pushdown for scalar subquery filters more difficult (Scalar subquery filters not pushed down to TableScan #21324)
Uncorrelated scalar subqueries previously did not work in ORDER BY or JOIN ON, or as arguments to an aggregate function. Those cases are now supported.

This PR introduces physical execution of uncorrelated scalar subqueries:

Uncorrelated subqueries are left in the plan by the optimizer, not rewritten into joins
The physical planner collects uncorrelated scalar subqueries and plans them recursively (supporting nested subqueries). We add a ScalarSubqueryExec plan node to the top of any physical plan with uncorrelated subqueries: it has N+1 children, N subqueries and its "main" input, which is the rest of the query plan. The subquery expression in the parent plan is replaced with a ScalarSubqueryExpr.
ScalarSubqueryExec manages the execution of the subqueries. Subquery evaluation is done in parallel (for a given query level), but at present it happens strictly before evaluation of the parent query. This might be improved in the future (Consider overlapping scalar subquery and parent query computation #21591).
ScalarSubqueryExpr reads its value from a shared slot that ScalarSubqueryExec populates when the subquery finishes; the physical planner assigns each subquery its slot index via ExecutionProps.

This architecture makes it easy to avoid the two shortcomings described above. Performance seems roughly unchanged (benchmarks added in this PR), but in situations like #18181, we can now leverage scalar fast-paths; in the case of #18181 specifically, this improves performance from ~800 ms to ~30 ms.

What changes are included in this PR?

Modify subquery rewriter to not transform subqueries -> joins
Collect and plan uncorrelated scalar subqueries in the physical planner, and wire up ScalarSubqueryExpr
Support for subqueries in physical plan serialization/deserialization using PhysicalProtoConverterExtension to wire up ScalarSubqueryExpr correctly
Support for subqueries in logical plan serialization/deserialization
Add various SLT tests and update expected plan shapes for some tests

Are these changes tested?

Yes. New SLT coverage for cardinality errors, ORDER BY / JOIN ON / aggregate-arg contexts, nested uncorrelated subqueries, duplicate-subquery deduplication, and partition-pruning filters; new roundtrip tests for logical and physical plan serialization.

Are there any user-facing changes?

SQL:

Uncorrelated scalar subqueries that return more than one row now result in a runtime error, instead of silently producing incorrect results.
Uncorrelated scalar subqueries now work in ORDER BY, JOIN ON, and as aggregate function arguments.

Rust APIs:

In datafusion-proto, breaking changes to Serializeable::from_bytes_with_registry (renamed to from_bytes_with_ctx), parse_expr / parse_sorts / parse_exprs, and the PhysicalProtoConverterExtension trait.

Plan shape:

LogicalPlan::Subquery nodes will now be preserved in the logical plan
Physical plans can now contain ScalarSubqueryExec plan node and ScalarSubqueryExpr expressions

The wire format has also changed to include scalar subqueries.

neilconway · 2026-03-29T18:58:53Z

-pub struct DefaultPhysicalProtoConverter;
+#[derive(Default)]
+pub struct DefaultPhysicalProtoConverter {
+    scalar_subquery_results: RefCell<Option<ScalarSubqueryResults>>,


I don't know the serialization/deserialization code well; would love feedback on whether this is the right way to do this.

This feels like a bit of an anti-pattern. I'm going to need a bit of time to dive into what's going on here, but hopefully will get to it either this afternoon or maybe Sunday evening.

I put up this PR targeting you branch as an explanation of what I mean.

The problem I have with adding state data to DefaultPhysicalProtoConverter is that now any time we have a custom proto converter that doesn't call the default, we will not be able to process these scalar subquery results.

Instead I think we just have to plumb this data member through the deserialization process. I haven't taken a super deep look into exactly how this ends up getting used to see if there's another way to take advantage. The method I used in the PR was basically to add a struct that contains all of the parts we pass through deserialization and add the scalar_subquery_results to it.

In regards to switching from FunctionRegistry -> TaskContext that's a great change. It was done part way in recent releases for the physical side but not on the logical side. It makes perfect sense to do it the way you have on the logical side.

neilconway · 2026-03-29T19:01:42Z

+            // Create the shared results container and register it (along with
+            // the index map) in ExecutionProps so that `create_physical_expr`
+            // can resolve `Expr::ScalarSubquery` into `ScalarSubqueryExpr`
+            // nodes. We clone the SessionState so these are available
+            // throughout physical planning without mutating the caller's state.
+            //
+            // Ideally, the subquery state would live in a dedicated planning
+            // context rather than on ExecutionProps (which is meant for
+            // session-level configuration). It's here because
+            // `create_physical_expr` only receives `&ExecutionProps`, and
+            // changing that signature would be a breaking public API change.
+            let results: Arc<Vec<OnceLock<ScalarValue>>> =
+                Arc::new((0..links.len()).map(|_| OnceLock::new()).collect());
+            let session_state = if links.is_empty() {
+                Cow::Borrowed(session_state)
+            } else {
+                let mut owned = session_state.clone();
+                owned.execution_props_mut().subquery_indexes = index_map;
+                owned.execution_props_mut().subquery_results = Arc::clone(&results);
+                Cow::Owned(owned)
+            };


This seemed a bit kludgy but I couldn't think of a better way to do it; feedback/suggestions welcome.

…-expr

Dandandan · 2026-03-30T17:30:45Z

run benchmarks

adriangbot · 2026-03-30T17:33:29Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4156823048-606-pw9cn 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/scalar-subquery-expr (b9bce91) to 0be5982 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-03-30T17:33:46Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4156823048-607-zdt8z 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/scalar-subquery-expr (b9bce91) to 0be5982 (merge-base) diff using: tpcds
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-03-30T17:33:57Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4156823048-608-fgcr6 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/scalar-subquery-expr (b9bce91) to 0be5982 (merge-base) diff using: tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-03-30T17:47:05Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and neilc_scalar-subquery-expr
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃     neilc_scalar-subquery-expr ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 45.05 / 45.95 ±0.92 / 47.44 ms │ 45.56 / 45.98 ±0.75 / 47.47 ms │    no change │
│ QQuery 2  │ 21.19 / 21.38 ±0.21 / 21.66 ms │ 21.35 / 21.56 ±0.19 / 21.91 ms │    no change │
│ QQuery 3  │ 31.73 / 32.19 ±0.50 / 33.11 ms │ 31.92 / 32.43 ±0.31 / 32.80 ms │    no change │
│ QQuery 4  │ 20.46 / 21.29 ±0.60 / 22.11 ms │ 20.37 / 21.26 ±0.79 / 22.23 ms │    no change │
│ QQuery 5  │ 48.69 / 50.41 ±1.17 / 51.92 ms │ 48.36 / 49.77 ±1.68 / 52.96 ms │    no change │
│ QQuery 6  │ 17.02 / 17.19 ±0.14 / 17.45 ms │ 17.25 / 18.05 ±1.00 / 19.84 ms │    no change │
│ QQuery 7  │ 53.55 / 54.54 ±0.56 / 55.18 ms │ 54.03 / 54.80 ±0.71 / 55.93 ms │    no change │
│ QQuery 8  │ 47.88 / 48.53 ±0.50 / 49.43 ms │ 48.31 / 49.01 ±1.02 / 51.03 ms │    no change │
│ QQuery 9  │ 54.63 / 55.50 ±0.78 / 56.86 ms │ 54.33 / 55.42 ±0.91 / 56.60 ms │    no change │
│ QQuery 10 │ 71.18 / 71.61 ±0.39 / 72.33 ms │ 69.97 / 70.95 ±0.65 / 71.66 ms │    no change │
│ QQuery 11 │ 13.76 / 14.07 ±0.24 / 14.45 ms │ 34.60 / 35.26 ±0.51 / 36.02 ms │ 2.51x slower │
│ QQuery 12 │ 27.78 / 28.16 ±0.24 / 28.52 ms │ 28.04 / 28.71 ±1.10 / 30.90 ms │    no change │
│ QQuery 13 │ 38.02 / 38.83 ±0.59 / 39.63 ms │ 38.41 / 39.41 ±0.91 / 41.05 ms │    no change │
│ QQuery 14 │ 28.51 / 28.89 ±0.32 / 29.45 ms │ 28.51 / 28.71 ±0.15 / 28.96 ms │    no change │
│ QQuery 15 │ 33.38 / 33.64 ±0.23 / 34.01 ms │ 81.32 / 82.08 ±0.58 / 82.76 ms │ 2.44x slower │
│ QQuery 16 │ 15.85 / 16.08 ±0.20 / 16.44 ms │ 15.90 / 16.18 ±0.15 / 16.30 ms │    no change │
│ QQuery 17 │ 71.98 / 72.73 ±0.44 / 73.31 ms │ 73.16 / 73.69 ±0.33 / 74.06 ms │    no change │
│ QQuery 18 │ 76.62 / 78.05 ±1.00 / 79.49 ms │ 77.03 / 79.02 ±1.36 / 80.85 ms │    no change │
│ QQuery 19 │ 37.61 / 38.00 ±0.44 / 38.76 ms │ 37.96 / 38.14 ±0.20 / 38.43 ms │    no change │
│ QQuery 20 │ 40.10 / 40.87 ±0.74 / 42.16 ms │ 40.11 / 41.48 ±1.00 / 42.90 ms │    no change │
│ QQuery 21 │ 64.14 / 65.78 ±0.89 / 66.56 ms │ 64.51 / 65.90 ±0.71 / 66.44 ms │    no change │
│ QQuery 22 │ 17.71 / 18.20 ±0.33 / 18.70 ms │ 50.61 / 51.89 ±0.92 / 53.42 ms │ 2.85x slower │
└───────────┴────────────────────────────────┴────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                         ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 891.89ms │
│ Total Time (neilc_scalar-subquery-expr)   │ 999.71ms │
│ Average Time (HEAD)                       │  40.54ms │
│ Average Time (neilc_scalar-subquery-expr) │  45.44ms │
│ Queries Faster                            │        0 │
│ Queries Slower                            │        3 │
│ Queries with No Change                    │       19 │
│ Queries with Failure                      │        0 │
└───────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric	Value
Wall time	4.7s
Peak memory	4.0 GiB
Avg memory	3.6 GiB
CPU user	33.0s
CPU sys	3.1s
Disk read	0 B
Disk write	136.0 KiB

tpch — branch

Metric	Value
Wall time	5.2s
Peak memory	4.0 GiB
Avg memory	3.6 GiB
CPU user	36.4s
CPU sys	3.2s
Disk read	0 B
Disk write	65.3 MiB

File an issue against this benchmark runner

Dandandan · 2026-03-30T17:48:24Z

run benchmark tpch10

adriangbot · 2026-03-30T17:48:48Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4156947198-609-ngld5 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/scalar-subquery-expr (b9bce91) to 0be5982 (merge-base) diff using: tpch10
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-03-30T17:53:26Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and neilc_scalar-subquery-expr
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃            neilc_scalar-subquery-expr ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.32 / 4.53 ±6.33 / 17.19 ms │          1.31 / 4.51 ±6.30 / 17.12 ms │     no change │
│ QQuery 1  │        14.23 / 14.53 ±0.20 / 14.86 ms │        14.35 / 14.64 ±0.17 / 14.86 ms │     no change │
│ QQuery 2  │        44.31 / 44.58 ±0.28 / 45.02 ms │        44.40 / 44.75 ±0.31 / 45.19 ms │     no change │
│ QQuery 3  │        43.28 / 44.24 ±0.70 / 45.35 ms │        44.70 / 45.73 ±0.88 / 47.18 ms │     no change │
│ QQuery 4  │     286.08 / 290.96 ±3.34 / 294.94 ms │     290.52 / 300.63 ±6.22 / 307.69 ms │     no change │
│ QQuery 5  │     343.30 / 360.49 ±9.19 / 368.08 ms │     344.37 / 347.19 ±2.44 / 350.28 ms │     no change │
│ QQuery 6  │           5.46 / 5.92 ±0.40 / 6.44 ms │           5.40 / 5.97 ±0.29 / 6.20 ms │     no change │
│ QQuery 7  │        17.17 / 19.36 ±3.27 / 25.76 ms │        16.87 / 18.58 ±2.12 / 22.74 ms │     no change │
│ QQuery 8  │     432.14 / 441.42 ±9.03 / 452.58 ms │     433.86 / 443.34 ±8.18 / 453.05 ms │     no change │
│ QQuery 9  │     665.10 / 676.08 ±9.11 / 689.32 ms │     624.49 / 635.81 ±6.97 / 645.27 ms │ +1.06x faster │
│ QQuery 10 │        92.37 / 94.29 ±1.59 / 96.89 ms │        90.27 / 93.43 ±2.54 / 97.79 ms │     no change │
│ QQuery 11 │     104.22 / 105.64 ±1.45 / 107.54 ms │     103.32 / 105.88 ±1.55 / 107.78 ms │     no change │
│ QQuery 12 │     344.34 / 349.25 ±3.25 / 353.07 ms │     345.08 / 347.85 ±1.60 / 349.45 ms │     no change │
│ QQuery 13 │     463.79 / 472.59 ±7.95 / 485.80 ms │     457.64 / 463.97 ±6.15 / 472.80 ms │     no change │
│ QQuery 14 │     350.37 / 356.22 ±3.77 / 360.54 ms │     346.54 / 352.03 ±6.32 / 364.15 ms │     no change │
│ QQuery 15 │    360.40 / 374.90 ±17.68 / 406.65 ms │    375.51 / 394.04 ±32.95 / 459.82 ms │  1.05x slower │
│ QQuery 16 │    714.01 / 738.95 ±23.24 / 774.61 ms │    728.40 / 746.45 ±14.43 / 765.84 ms │     no change │
│ QQuery 17 │    714.60 / 731.23 ±12.85 / 746.56 ms │     715.64 / 721.12 ±5.66 / 731.77 ms │     no change │
│ QQuery 18 │ 1430.78 / 1488.33 ±40.84 / 1548.80 ms │ 1379.93 / 1479.71 ±51.92 / 1528.42 ms │     no change │
│ QQuery 19 │        35.90 / 37.02 ±1.18 / 39.14 ms │        35.40 / 37.33 ±1.81 / 40.76 ms │     no change │
│ QQuery 20 │    713.45 / 735.48 ±24.51 / 771.36 ms │    712.34 / 727.29 ±14.90 / 754.80 ms │     no change │
│ QQuery 21 │     754.02 / 765.34 ±6.85 / 774.44 ms │     761.37 / 764.62 ±2.67 / 768.81 ms │     no change │
│ QQuery 22 │  1123.65 / 1128.39 ±4.69 / 1137.31 ms │  1126.97 / 1131.73 ±7.10 / 1145.76 ms │     no change │
│ QQuery 23 │ 3041.09 / 3062.25 ±18.65 / 3096.08 ms │  3033.97 / 3043.12 ±7.01 / 3055.29 ms │     no change │
│ QQuery 24 │     101.54 / 103.59 ±1.75 / 106.55 ms │      98.71 / 100.39 ±1.13 / 101.92 ms │     no change │
│ QQuery 25 │     142.10 / 142.85 ±0.49 / 143.58 ms │     136.56 / 138.17 ±0.90 / 139.25 ms │     no change │
│ QQuery 26 │     100.19 / 102.93 ±2.31 / 107.10 ms │      98.00 / 100.86 ±2.33 / 103.12 ms │     no change │
│ QQuery 27 │     849.12 / 854.43 ±7.74 / 869.79 ms │     846.66 / 853.51 ±4.77 / 857.99 ms │     no change │
│ QQuery 28 │ 7705.51 / 7745.32 ±22.00 / 7770.71 ms │ 7697.89 / 7744.14 ±31.93 / 7780.46 ms │     no change │
│ QQuery 29 │        50.77 / 55.69 ±5.09 / 65.45 ms │        50.30 / 53.99 ±4.24 / 61.53 ms │     no change │
│ QQuery 30 │     363.99 / 370.45 ±4.29 / 377.11 ms │     356.81 / 365.83 ±6.34 / 375.05 ms │     no change │
│ QQuery 31 │    362.12 / 377.82 ±11.94 / 394.11 ms │     376.70 / 380.15 ±3.81 / 386.17 ms │     no change │
│ QQuery 32 │ 1200.38 / 1267.05 ±55.53 / 1326.36 ms │ 1265.70 / 1294.94 ±27.34 / 1344.67 ms │     no change │
│ QQuery 33 │ 1460.50 / 1499.33 ±45.94 / 1580.55 ms │ 1470.47 / 1563.53 ±46.86 / 1592.95 ms │     no change │
│ QQuery 34 │  1431.98 / 1445.24 ±8.97 / 1459.07 ms │  1442.78 / 1454.41 ±8.09 / 1463.45 ms │     no change │
│ QQuery 35 │     382.15 / 386.54 ±3.26 / 390.79 ms │     379.35 / 385.51 ±7.65 / 397.78 ms │     no change │
│ QQuery 36 │     120.63 / 123.11 ±2.38 / 127.06 ms │     112.43 / 120.57 ±5.93 / 129.56 ms │     no change │
│ QQuery 37 │        48.56 / 49.41 ±0.56 / 50.23 ms │        48.03 / 50.15 ±1.57 / 52.85 ms │     no change │
│ QQuery 38 │        76.82 / 77.82 ±1.61 / 81.02 ms │        73.92 / 76.14 ±1.74 / 78.88 ms │     no change │
│ QQuery 39 │     220.70 / 223.98 ±1.85 / 226.23 ms │     204.83 / 218.14 ±7.68 / 228.76 ms │     no change │
│ QQuery 40 │        20.76 / 23.38 ±1.78 / 25.13 ms │        23.89 / 25.58 ±1.20 / 27.12 ms │  1.09x slower │
│ QQuery 41 │        20.53 / 22.07 ±1.92 / 25.72 ms │        19.67 / 20.48 ±0.58 / 21.38 ms │ +1.08x faster │
│ QQuery 42 │        19.62 / 19.98 ±0.25 / 20.37 ms │        18.69 / 20.48 ±1.77 / 23.84 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 27232.98ms │
│ Total Time (neilc_scalar-subquery-expr)   │ 27236.70ms │
│ Average Time (HEAD)                       │   633.33ms │
│ Average Time (neilc_scalar-subquery-expr) │   633.41ms │
│ Queries Faster                            │          2 │
│ Queries Slower                            │          2 │
│ Queries with No Change                    │         39 │
│ Queries with Failure                      │          0 │
└───────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	137.3s
Peak memory	43.4 GiB
Avg memory	30.6 GiB
CPU user	1280.1s
CPU sys	100.5s
Disk read	0 B
Disk write	3.7 GiB

clickbench_partitioned — branch

Metric	Value
Wall time	137.3s
Peak memory	42.2 GiB
Avg memory	30.8 GiB
CPU user	1274.8s
CPU sys	105.0s
Disk read	0 B
Disk write	756.0 KiB

File an issue against this benchmark runner

…-expr

Initial implementation

4a3e553

neilconway commented Mar 29, 2026

View reviewed changes

Comment thread datafusion/physical-plan/src/scalar_subquery.rs Outdated

neilconway commented Mar 29, 2026

View reviewed changes

neilconway added 10 commits March 29, 2026 15:02

cargo fmt

6b4f5c0

Properly wait for subquery exec to complete before exec'ing main input

1412ab1

Better fix for async exec issue

cedfa5c

Fix doc lint error

d80569f

Implement logical plan serialization/deserialization for subqueries

9f606fb

cargo fmt

b07491b

Refactor logical plan deserialization

27a1ac2

Merge remote-tracking branch 'origin/main' into neilc/scalar-subquery…

bce0a6d

…-expr

Increase large files size check

7071001

fix clippy

b9bce91

github-actions bot added the development-process Related to development process of DataFusion label Mar 30, 2026

neilconway added 3 commits April 13, 2026 20:33

Refactor unit test code

0416b0d

Fix doc build

b9d307b

Doc fix

e787873

neilconway mentioned this pull request Apr 14, 2026

Deduplicate uncorrelated scalar subqueries #21619

Open

neilconway added 16 commits April 14, 2026 10:15

Tweak comments

ce1e3c0

Merge remote-tracking branch 'origin/main' into neilc/scalar-subquery…

c32edba

…-expr

Revert clippy benchmark fix

50d0ef8

Update stale comment

5935028

Use IndexSet instead of HashSet + Vec

4b60787

Reduce unnecessary cloning of subqueries

177a190

Introduce type wrapper for subquery index

cf582e3

Minor cleanups

2245153

Fix clippy

19f796c

Tweak metadata SLT query

12b4b6b

New approach to proto serialization

59a47c8

Refactor new proto approach

9ed5cf7

Update stale comments

010bc43

Various cleanups

7c7265b

Further cleanups

200f081

Add migration guide notes

58481f4

github-actions bot added the documentation Improvements or additions to documentation label Apr 16, 2026

neilconway added 9 commits April 16, 2026 15:54

Merge remote-tracking branch 'origin/main' into neilc/scalar-subquery…

47ae89c

…-expr

Fix CI failures

6d8d056

Unbreak CI, hopefully

0f3bb7b

Fix prettier issues

a7ea4c8

Fix another prettier issue

f6ab615

Fix README

975ba7c

Merge remote-tracking branch 'origin/main' into neilc/scalar-subquery…

443e05f

…-expr

Sigh, fix prettier again

5cc14ea

Merge remote-tracking branch 'origin/main' into neilc/scalar-subquery…

bd19609

…-expr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Implement physical execution of uncorrelated scalar subqueries#21240

perf: Implement physical execution of uncorrelated scalar subqueries#21240
neilconway wants to merge 69 commits intoapache:mainfrom
neilconway:neilc/scalar-subquery-expr

neilconway commented Mar 29, 2026 •

edited

Loading

Uh oh!

neilconway Mar 29, 2026

Uh oh!

timsaucer Apr 10, 2026

Uh oh!

timsaucer Apr 12, 2026

Uh oh!

Uh oh!

neilconway Mar 29, 2026

Uh oh!

Dandandan commented Mar 30, 2026

Uh oh!

adriangbot commented Mar 30, 2026

Uh oh!

adriangbot commented Mar 30, 2026

Uh oh!

adriangbot commented Mar 30, 2026

Uh oh!

adriangbot commented Mar 30, 2026

Uh oh!

Dandandan commented Mar 30, 2026

Uh oh!

adriangbot commented Mar 30, 2026

Uh oh!

adriangbot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

neilconway commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

neilconway Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

timsaucer Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

timsaucer Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

neilconway Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Dandandan commented Mar 30, 2026

Uh oh!

adriangbot commented Mar 30, 2026

Uh oh!

adriangbot commented Mar 30, 2026

Uh oh!

adriangbot commented Mar 30, 2026

Uh oh!

adriangbot commented Mar 30, 2026

Uh oh!

Dandandan commented Mar 30, 2026

Uh oh!

adriangbot commented Mar 30, 2026

Uh oh!

adriangbot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

neilconway commented Mar 29, 2026 •

edited

Loading