-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: simplify/optimize/fix the aggregate functions. #8680
Conversation
@nvanbenschoten @RaduBerinde if you have any suggestion as to why int sum and average become slower, I'll take it. Thanks! |
fe64f53
to
d8a53a0
Compare
@knz can you include |
@dt why me? :) |
The int may be slower because we used to call It is really unfortunate we have to go through decimals for all int aggregations. I assume the reason is overflow? We could use Review status: 0 of 4 files reviewed at latest revision, all discussions resolved, all commit checks successful. Comments from Reviewable |
on the agg handling side, though I defer to @nvanbenschoten on some of the impl changes since I think he's been in there most recently. Review status: 0 of 4 files reviewed at latest revision, all discussions resolved, all commit checks successful. Comments from Reviewable |
Thanks @RaduBerinde for the hint. I have simplified the int sum as you suggest and indeed it thus becomes faster. I also implemented the suggested optimization to use ints for the sum until an overflow occurs, however the overflow check itself is rather expensive (time wise) so there isn't a big difference in the benchmark. Review status: 0 of 4 files reviewed at latest revision, all discussions resolved, some commit checks pending. Comments from Reviewable |
Also I put the benchmem results in the commits message as you suggested @dt. |
General approach . Could you add a breif enumeration of the simplifications/optimizations you made in the commit message. Something like:
Interestingly, WRT the addition in an Reviewed 3 of 4 files at r1, 1 of 1 files at r2. sql/parser/aggregate_builtins.go, line 425 [r2] (raw file):
This deserves a comment. Comments from Reviewable |
@knz Nice! Note that the existing benchmarks don't really exercise the optimization (they use random 63 bit integers so they are likely to overflow very soon). Maybe add a SmallInt version of the benchmarks (and a Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. Comments from Reviewable |
@RaduBerinde I have added the small int benchmarks as you suggested and updated the commit message with the new benchmark results. @nvanbenschoten I extended the commit message accordingly. Review status: 0 of 4 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending. sql/parser/aggregate_builtins.go, line 425 [r2] (raw file):
|
Prior to this patch, `varianceAggregate`, `sumAggregate` (and indirectly `avg` and `stddev` aggregates which used the former) tried to be "smart" and share code between various argument types. However this actually caused extra complexity and run-time overhead. Indeed, every time the Add method was called: - a type switch was performed on the argument (although every value added to a given variance aggregator will always be of the same type) - a check was made whether an implementation for its argument's type had been instantiated already, to instantiate it if needed (although given the argument type is known after type checking, this instantiation could have been performed prior to the first Add call) This patch addresses this shortcoming by separating implementations by argument type. In addition, minor optimizations were also implemented: - the error return value was removed, since there is no error condition. - the `sum` (and thus `avg`) aggregates now use a single `int64` as long as the sum does not overflow. Also a long-standing bug with COUNT was fixed (the COUNT of a tuple of NULL values is 1, not 0). ``` name old time/op new time/op delta AvgAggregateInt1K-6 57.1µs ± 1% 55.3µs ± 0% -3.27% (p=0.000 n=9+8) AvgAggregateSmallInt1K-6 60.1µs ± 1% 17.8µs ± 2% -70.33% (p=0.000 n=9+10) AvgAggregateFloat1K-6 10.8µs ± 3% 8.8µs ± 1% -19.02% (p=0.000 n=10+9) AvgAggregateDecimal1K-6 252µs ± 3% 249µs ± 3% ~ (p=0.105 n=10+10) CountAggregate1K-6 5.47µs ± 2% 3.83µs ± 1% -29.99% (p=0.000 n=10+10) SumAggregateInt1K-6 52.1µs ± 1% 49.2µs ± 1% -5.59% (p=0.000 n=10+10) SumAggregateSmallInt1K-6 55.0µs ± 1% 13.3µs ± 2% -75.85% (p=0.000 n=10+10) SumAggregateFloat1K-6 7.04µs ± 5% 5.22µs ± 1% -25.79% (p=0.000 n=10+10) SumAggregateDecimal1K-6 246µs ± 2% 237µs ± 2% -3.77% (p=0.000 n=10+10) MaxAggregateInt1K-6 8.92µs ± 1% 8.48µs ± 2% -4.99% (p=0.000 n=8+10) MaxAggregateFloat1K-6 9.26µs ± 2% 9.94µs ± 1% +7.29% (p=0.000 n=10+10) MaxAggregateDecimal1K-6 113µs ±12% 126µs ±43% ~ (p=0.661 n=9+10) MinAggregateInt1K-6 9.52µs ± 1% 9.23µs ± 1% -3.02% (p=0.000 n=9+10) MinAggregateFloat1K-6 9.12µs ± 1% 8.88µs ± 1% -2.57% (p=0.000 n=8+10) MinAggregateDecimal1K-6 223µs ± 5% 224µs ± 2% ~ (p=0.842 n=10+9) VarianceAggregateInt1K-6 1.49ms ± 1% 1.49ms ± 1% ~ (p=0.274 n=10+8) VarianceAggregateFloat1K-6 14.2µs ± 1% 11.7µs ± 1% -17.42% (p=0.000 n=9+10) VarianceAggregateDecimal1K-6 1.36ms ± 2% 1.31ms ± 2% -4.12% (p=0.000 n=10+10) StddevAggregateInt1K-6 1.69ms ± 1% 1.65ms ± 2% -2.47% (p=0.000 n=10+10) StddevAggregateFloat1K-6 15.7µs ± 1% 11.8µs ± 2% -24.81% (p=0.000 n=9+10) StddevAggregateDecimal1K-6 1.38ms ± 1% 1.32ms ± 5% -4.41% (p=0.000 n=9+10) name old alloc/op new alloc/op delta AvgAggregateInt1K-6 772B ± 0% 692B ± 0% -10.36% (p=0.000 n=10+10) AvgAggregateSmallInt1K-6 756B ± 0% 628B ± 0% -16.93% (p=0.000 n=10+10) AvgAggregateFloat1K-6 128B ± 0% 64B ± 0% -50.00% (p=0.000 n=10+10) AvgAggregateDecimal1K-6 142kB ± 1% 142kB ± 1% ~ (p=0.239 n=10+10) CountAggregate1K-6 16.0B ± 0% 16.0B ± 0% ~ (all samples are equal) SumAggregateInt1K-6 304B ± 0% 192B ± 0% -36.84% (p=0.000 n=10+10) SumAggregateSmallInt1K-6 304B ± 0% 144B ± 0% -52.63% (p=0.000 n=10+10) SumAggregateFloat1K-6 120B ± 0% 24B ± 0% -80.00% (p=0.000 n=10+10) SumAggregateDecimal1K-6 142kB ± 1% 141kB ± 2% ~ (p=0.123 n=10+10) MaxAggregateInt1K-6 16.0B ± 0% 16.0B ± 0% ~ (all samples are equal) MaxAggregateFloat1K-6 16.0B ± 0% 16.0B ± 0% ~ (all samples are equal) MaxAggregateDecimal1K-6 57.7kB ±17% 67.7kB ±58% ~ (p=0.645 n=9+10) MinAggregateInt1K-6 16.0B ± 0% 16.0B ± 0% ~ (all samples are equal) MinAggregateFloat1K-6 16.0B ± 0% 16.0B ± 0% ~ (all samples are equal) MinAggregateDecimal1K-6 138kB ± 5% 140kB ± 2% ~ (p=0.305 n=10+9) VarianceAggregateInt1K-6 773kB ± 0% 773kB ± 0% ~ (p=0.743 n=8+9) VarianceAggregateFloat1K-6 104B ± 0% 40B ± 0% -61.54% (p=0.000 n=10+10) VarianceAggregateDecimal1K-6 709kB ± 2% 706kB ± 1% ~ (p=0.436 n=10+10) StddevAggregateInt1K-6 869kB ± 0% 868kB ± 0% ~ (p=0.128 n=9+10) StddevAggregateFloat1K-6 112B ± 0% 64B ± 0% -42.86% (p=0.000 n=10+10) StddevAggregateDecimal1K-6 716kB ± 1% 723kB ± 2% +1.06% (p=0.028 n=9+10) name old allocs/op new allocs/op delta AvgAggregateInt1K-6 16.0 ± 0% 15.0 ± 0% -6.25% (p=0.000 n=10+10) AvgAggregateSmallInt1K-6 16.0 ± 0% 14.0 ± 0% -12.50% (p=0.000 n=10+10) AvgAggregateFloat1K-6 3.00 ± 0% 4.00 ± 0% +33.33% (p=0.000 n=10+10) AvgAggregateDecimal1K-6 2.97k ± 1% 2.96k ± 1% ~ (p=0.239 n=10+10) CountAggregate1K-6 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) SumAggregateInt1K-6 5.00 ± 0% 3.00 ± 0% -40.00% (p=0.000 n=10+10) SumAggregateSmallInt1K-6 5.00 ± 0% 2.00 ± 0% -60.00% (p=0.000 n=10+10) SumAggregateFloat1K-6 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) SumAggregateDecimal1K-6 2.95k ± 1% 2.93k ± 2% ~ (p=0.128 n=10+10) MaxAggregateInt1K-6 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) MaxAggregateFloat1K-6 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) MaxAggregateDecimal1K-6 1.20k ±17% 1.41k ±58% ~ (p=0.645 n=9+10) MinAggregateInt1K-6 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) MinAggregateFloat1K-6 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) MinAggregateDecimal1K-6 2.88k ± 5% 2.92k ± 2% ~ (p=0.305 n=10+9) VarianceAggregateInt1K-6 17.4k ± 0% 17.4k ± 0% ~ (p=0.799 n=8+9) VarianceAggregateFloat1K-6 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10) VarianceAggregateDecimal1K-6 15.9k ± 1% 16.0k ± 1% ~ (p=0.905 n=9+10) StddevAggregateInt1K-6 19.1k ± 0% 19.1k ± 0% ~ (p=0.288 n=10+10) StddevAggregateFloat1K-6 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) StddevAggregateDecimal1K-6 16.1k ± 0% 16.3k ± 1% ~ (p=0.079 n=9+10) ```
} | ||
|
||
type avgAggregate struct { | ||
sumAggregate | ||
agg AggregateFunc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this to an interface type? ISTM that this will cause dynamic dispatch where static dispatch would otherwise be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, because there are multiple summing aggregates now. disregard!
Comments from Reviewable |
Prior to this patch,
varianceAggregate
,sumAggregate
(andindirectly
avg
andstddev
aggregates which used the former) triedto be "smart" and share code between various argument types. However
this actually caused extra complexity and run-time overhead. Indeed,
every time the Add method was called:
added to a given variance aggregator will always be of the same
type)
had been instantiated already, to instantiate it if needed (although
given the argument type is known after type checking, this
instantiation could have been performed prior to the first Add call)
This patch addresses this shortcoming by separating implementations by
argument type.
In addition, minor optimizations were also implemented:
sum
(and thusavg
) aggregates now use a singleint64
aslong as the sum does not overflow.
Also a long-standing bug with COUNT was fixed (the COUNT of a tuple of
NULL values is 1, not 0).
Fixes #8678.
Fixes #8711.
This change is