Port filter_pushdown.rs async tests to sqllogictest by adriangb · Pull Request #21620 · apache/datafusion

adriangb · 2026-04-14T14:21:39Z

Which issue does this PR close?

Closes #.

Rationale for this change

#21160 added datafusion.explain.analyze_categories, which lets EXPLAIN ANALYZE emit only deterministic metric categories (e.g. 'rows'). That unlocked a long-standing blocker on porting tests out of datafusion/core/tests/physical_optimizer/filter_pushdown.rs: previously these tests had to assert on execution state via insta snapshots over hand-wired ExecutionPlan trees and mock TestSource data, which kept them expensive to read, expensive to update, and impossible to test from the user-facing SQL path.

With analyze_categories = 'rows', the predicate=DynamicFilter [ ... ] text on a parquet scan is stable across runs, so the same invariants can now be expressed as plain EXPLAIN ANALYZE SQL in sqllogictest, where they are easier to read, easier to update, and exercise the full SQL → logical optimizer → physical optimizer → execution pipeline rather than a single optimizer rule in isolation.

What changes are included in this PR?

24 end-to-end filter-pushdown tests are ported out of filter_pushdown.rs and deleted. The helpers run_aggregate_dyn_filter_case and run_projection_dyn_filter_case (and their supporting structs) are deleted along with the tests that used them. The 24 synchronous #[test] optimizer-rule-in-isolation tests are untouched — they stay in Rust because they specifically exercise FilterPushdown::new() / OptimizationTest over a hand-built plan.

`datafusion/sqllogictest/test_files/push_down_filter_parquet.slt`

New tests covering:

TopK dynamic filter pushdown integration (100k-row parquet, max_row_group_size = 128, asserting on pushdown_rows_matched = 128 / pushdown_rows_pruned = 99.87 K)
TopK single-column and multi-column (compound-sort) dynamic filter shapes
HashJoin CollectLeft dynamic filter with struct(a, b) IN (SET) ([...]) content
Nested hash joins propagating filters to both inner scans
Parent WHERE filter splitting across the two sides of a HashJoin
TopK above HashJoin, with both dynamic filters ANDed on the probe scan
Dynamic filter flowing through a GROUP BY sitting between a HashJoin and the probe scan
TopK projection rewrite — reorder, prune, expression, alias shadowing
NULL-bearing build-side join keys
LEFT JOIN and LEFT SEMI JOIN dynamic filter pushdown
HashTable strategy (hash_lookup) via hash_join_inlist_pushdown_max_size = 1, on both string and integer multi-column keys

`datafusion/sqllogictest/test_files/push_down_filter_regression.slt`

New tests covering:

Aggregate dynamic filter baseline: MIN(a), MAX(a), MIN(a), MAX(a), MIN(a), MAX(b), mixed MIN/MAX with an unsupported expression input, all-NULL input (filter stays true), MIN(a+1) (no filter emitted)
WHERE filter on a grouping column pushes through AggregateExec
HAVING count(b) > 5 filter stays above the aggregate
End-to-end aggregate dynamic filter actually pruning a multi-file parquet scan

The aggregate baseline tests run under analyze_level = summary + analyze_categories = 'none' so that metrics render empty and only the predicate=DynamicFilter [ ... ] content remains — the filter text is deterministic even though the pruning counts are subject to parallel-execution scheduling.

What stayed in Rust

Ten async tests now carry a short // Not portable to sqllogictest: … header explaining why. In short, they either:

Hand-wire PartitionMode::Partitioned or a RepartitionExec boundary that SQL never constructs for the sizes of data these tests use
Assert via debug-only APIs (HashJoinExec::dynamic_filter_for_test().is_used(), ExecutionPlan::apply_expressions() + downcast_ref::<DynamicFilterPhysicalExpr>) that are not observable from SQL
Target the specific stacked-FilterExec shape (FilterPushdown do not generate correct column index when merge FilterExec #20109 regression) that the logical optimizer collapses before physical planning

Are these changes tested?

Yes — the ported tests are the tests. Each ported slt case was generated with cargo test -p datafusion-sqllogictest --test sqllogictests -- <file> --complete, then re-run twice back-to-back without --complete to confirm determinism. The remaining Rust filter_pushdown tests continue to pass (cargo test -p datafusion --test core_integration filter_pushdown → 47 passed, 0 failed). cargo clippy --tests -D warnings and cargo fmt --all are clean.

Test plan

cargo test -p datafusion-sqllogictest --test sqllogictests -- push_down_filter
cargo test -p datafusion --test core_integration filter_pushdown
cargo clippy -p datafusion --tests -- -D warnings
cargo fmt --all

Are there any user-facing changes?

No. This is a test-only refactor.

🤖 Generated with Claude Code

Port 24 end-to-end filter-pushdown tests out of `datafusion/core/tests/physical_optimizer/filter_pushdown.rs` into the sqllogictest suite. The new `datafusion.explain.analyze_categories` session config lets `EXPLAIN ANALYZE` emit only deterministic metric categories ('rows'), so these tests can assert directly on the `predicate=DynamicFilter [ ... ]` text without `<slt:ignore>` scrubbing around timing/bytes. ## What moved New/extended tests in `datafusion/sqllogictest/test_files/push_down_filter_parquet.slt`: - TopK dynamic filter pushdown (single-col, multi-col sort, integration with max_row_group_size=128 and pushdown_rows_matched / pushdown_rows_pruned counters) - HashJoin CollectLeft dynamic filter with `struct(a, b) IN (SET)` shape - Nested hash joins (filter propagates to both inner scans) - Parent filter split across the two sides of a HashJoin - TopK above HashJoin (both dynamic filters ANDed on the probe scan) - Dynamic filter through a GROUP BY between HashJoin and probe scan - TopK projection rewrite (reorder, prune, expression, alias shadowing) - NULL-bearing build-side join keys - LEFT JOIN and LEFT SEMI JOIN dynamic filter pushdown - HashTable strategy (`hash_lookup`) via `hash_join_inlist_pushdown_max_size = 1` on both string and integer multi-column keys New tests in `datafusion/sqllogictest/test_files/push_down_filter_regression.slt`: - Aggregate dynamic filter baseline: MIN(a), MAX(a), MIN(a) + MAX(a), MIN(a) + MAX(b), mixed MIN/MAX with unsupported expression input, all-NULL input (filter stays `true`), MIN(a+1) (no filter emitted) - Filter on grouping column pushes through AggregateExec - Filter on aggregate result (HAVING count > 5) stays above the aggregate - End-to-end aggregate dynamic filter pruning a multi-file parquet scan ## What stayed in Rust Ten async tests were marked non-portable with a short comment explaining why. In short: they either hand-wire `PartitionMode::Partitioned` / `RepartitionExec` structures SQL never constructs, assert via debug APIs (`dynamic_filter_for_test()`, `apply_expressions` + `downcast_ref::<DynamicFilterPhysicalExpr>`) that are not observable from SQL, or target the specific stacked-`FilterExec` shape that the logical optimizer collapses before physical planning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Apr 14, 2026

adriangb requested a review from alamb April 14, 2026 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port filter_pushdown.rs async tests to sqllogictest#21620

Port filter_pushdown.rs async tests to sqllogictest#21620
adriangb wants to merge 1 commit intoapache:mainfrom
adriangb:port-filter-pushdown-tests-to-slt

adriangb commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

adriangb commented Apr 14, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

datafusion/sqllogictest/test_files/push_down_filter_parquet.slt

datafusion/sqllogictest/test_files/push_down_filter_regression.slt

What stayed in Rust

Are these changes tested?

Test plan

Are there any user-facing changes?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`datafusion/sqllogictest/test_files/push_down_filter_parquet.slt`

`datafusion/sqllogictest/test_files/push_down_filter_regression.slt`