feat: change approx percentile/median UDFs to return floats#21074
feat: change approx percentile/median UDFs to return floats#21074theirix wants to merge 25 commits intoapache:mainfrom
Conversation
Jefffrey
left a comment
There was a problem hiding this comment.
Thanks for taking this up. Some of my concerns which held me off on this effort was the impact on error messaging (though I recall there was a recent PR to clean this up #20605) and if this deals with null type inputs correctly now (see #19458)
Also it seems some of the return types are changing now, I think we should call this out?
This reverts commit 09c0b90.
Thank you for the review! Added more tests for null - seems like it's consistent. The return value of these UDFs was float before (it is seen in UDF doc and also in other SQL engines), so it's reasonable to expect floats in tests too. Do you mean this change of types? |
Jefffrey
left a comment
There was a problem hiding this comment.
Added more tests for null - seems like it's consistent. The return value of these UDFs was float before (it is seen in UDF doc and also in other SQL engines), so it's reasonable to expect floats in tests too. Do you mean this change of types?
Yes, just referring to how it seems we used to return the input type but now we'll always coerce to float. We can see the return type changing in the SLTs, so would be good to highlight this in the title + PR body (since now we're going beyond a simple refactor)
| signature: Signature::one_of(variants, Volatility::Immutable), | ||
| } | ||
| // Additionally accept an integer number of centroids for T-Digest | ||
| let signature = Signature::one_of( |
There was a problem hiding this comment.
Since we're now coercing to floats, we can remove some of the implementation code that handles integer types
There was a problem hiding this comment.
Agree, removed some code. It also affected approx_median, piggybacking on approx_percentile_cont - updated signature and tests as well.
Clarified the scope of this PR in description - it changes more than expected
Jefffrey
left a comment
There was a problem hiding this comment.
Makes sense to change these to return floats since they are meant to be continuous instead of discrete 👍
|
Yes, it makes sense. Thank you for the review! |
|
With the output type changing, I wonder if that is a significant enough change to warrant an entry in the upgrade guide 🤔 |
Which issue does this PR close?
NUMERICS/INTEGERSindatafusion/expr-common/src/type_coercion/aggregates.rs#18092.Rationale for this change
Migrating to the modern TypeSignature API: 264030c/datafusion/expr-common/src/signature.rs
Coercing types of
approx_percentile_cont,approx_percentile_cont_with_weight,approx_medianto floats. It matches PostgreSQL, DuckDB, and ClickHouse behaviour, except for Spark.What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?