sql: simplify/optimize/fix the aggregate functions. #8680

knz · 2016-08-18T23:57:55Z

Prior to this patch, varianceAggregate, sumAggregate (and
indirectly avg and stddev aggregates which used the former) tried
to be "smart" and share code between various argument types. However
this actually caused extra complexity and run-time overhead. Indeed,
every time the Add method was called:

a type switch was performed on the argument (although every value
added to a given variance aggregator will always be of the same
type)
a check was made whether an implementation for its argument's type
had been instantiated already, to instantiate it if needed (although
given the argument type is known after type checking, this
instantiation could have been performed prior to the first Add call)

This patch addresses this shortcoming by separating implementations by
argument type.

In addition, minor optimizations were also implemented:

the error return value was removed, since there is no error condition.
the sum (and thus avg) aggregates now use a single int64 as
long as the sum does not overflow.

Also a long-standing bug with COUNT was fixed (the COUNT of a tuple of
NULL values is 1, not 0).

name                          old time/op    new time/op    delta
AvgAggregateInt1K-6             57.1µs ± 1%    55.3µs ± 0%   -3.27%          (p=0.000 n=9+8)
AvgAggregateSmallInt1K-6        60.1µs ± 1%    17.8µs ± 2%  -70.33%         (p=0.000 n=9+10)
AvgAggregateFloat1K-6           10.8µs ± 3%     8.8µs ± 1%  -19.02%         (p=0.000 n=10+9)
AvgAggregateDecimal1K-6          252µs ± 3%     249µs ± 3%     ~           (p=0.105 n=10+10)
CountAggregate1K-6              5.47µs ± 2%    3.83µs ± 1%  -29.99%        (p=0.000 n=10+10)
SumAggregateInt1K-6             52.1µs ± 1%    49.2µs ± 1%   -5.59%        (p=0.000 n=10+10)
SumAggregateSmallInt1K-6        55.0µs ± 1%    13.3µs ± 2%  -75.85%        (p=0.000 n=10+10)
SumAggregateFloat1K-6           7.04µs ± 5%    5.22µs ± 1%  -25.79%        (p=0.000 n=10+10)
SumAggregateDecimal1K-6          246µs ± 2%     237µs ± 2%   -3.77%        (p=0.000 n=10+10)
MaxAggregateInt1K-6             8.92µs ± 1%    8.48µs ± 2%   -4.99%         (p=0.000 n=8+10)
MaxAggregateFloat1K-6           9.26µs ± 2%    9.94µs ± 1%   +7.29%        (p=0.000 n=10+10)
MaxAggregateDecimal1K-6          113µs ±12%     126µs ±43%     ~            (p=0.661 n=9+10)
MinAggregateInt1K-6             9.52µs ± 1%    9.23µs ± 1%   -3.02%         (p=0.000 n=9+10)
MinAggregateFloat1K-6           9.12µs ± 1%    8.88µs ± 1%   -2.57%         (p=0.000 n=8+10)
MinAggregateDecimal1K-6          223µs ± 5%     224µs ± 2%     ~            (p=0.842 n=10+9)
VarianceAggregateInt1K-6        1.49ms ± 1%    1.49ms ± 1%     ~            (p=0.274 n=10+8)
VarianceAggregateFloat1K-6      14.2µs ± 1%    11.7µs ± 1%  -17.42%         (p=0.000 n=9+10)
VarianceAggregateDecimal1K-6    1.36ms ± 2%    1.31ms ± 2%   -4.12%        (p=0.000 n=10+10)
StddevAggregateInt1K-6          1.69ms ± 1%    1.65ms ± 2%   -2.47%        (p=0.000 n=10+10)
StddevAggregateFloat1K-6        15.7µs ± 1%    11.8µs ± 2%  -24.81%         (p=0.000 n=9+10)
StddevAggregateDecimal1K-6      1.38ms ± 1%    1.32ms ± 5%   -4.41%         (p=0.000 n=9+10)

name                          old alloc/op   new alloc/op   delta
AvgAggregateInt1K-6               772B ± 0%      692B ± 0%  -10.36%        (p=0.000 n=10+10)
AvgAggregateSmallInt1K-6          756B ± 0%      628B ± 0%  -16.93%        (p=0.000 n=10+10)
AvgAggregateFloat1K-6             128B ± 0%       64B ± 0%  -50.00%        (p=0.000 n=10+10)
AvgAggregateDecimal1K-6          142kB ± 1%     142kB ± 1%     ~           (p=0.239 n=10+10)
CountAggregate1K-6               16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
SumAggregateInt1K-6               304B ± 0%      192B ± 0%  -36.84%        (p=0.000 n=10+10)
SumAggregateSmallInt1K-6          304B ± 0%      144B ± 0%  -52.63%        (p=0.000 n=10+10)
SumAggregateFloat1K-6             120B ± 0%       24B ± 0%  -80.00%        (p=0.000 n=10+10)
SumAggregateDecimal1K-6          142kB ± 1%     141kB ± 2%     ~           (p=0.123 n=10+10)
MaxAggregateInt1K-6              16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
MaxAggregateFloat1K-6            16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
MaxAggregateDecimal1K-6         57.7kB ±17%    67.7kB ±58%     ~            (p=0.645 n=9+10)
MinAggregateInt1K-6              16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
MinAggregateFloat1K-6            16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
MinAggregateDecimal1K-6          138kB ± 5%     140kB ± 2%     ~            (p=0.305 n=10+9)
VarianceAggregateInt1K-6         773kB ± 0%     773kB ± 0%     ~             (p=0.743 n=8+9)
VarianceAggregateFloat1K-6        104B ± 0%       40B ± 0%  -61.54%        (p=0.000 n=10+10)
VarianceAggregateDecimal1K-6     709kB ± 2%     706kB ± 1%     ~           (p=0.436 n=10+10)
StddevAggregateInt1K-6           869kB ± 0%     868kB ± 0%     ~            (p=0.128 n=9+10)
StddevAggregateFloat1K-6          112B ± 0%       64B ± 0%  -42.86%        (p=0.000 n=10+10)
StddevAggregateDecimal1K-6       716kB ± 1%     723kB ± 2%   +1.06%         (p=0.028 n=9+10)

name                          old allocs/op  new allocs/op  delta
AvgAggregateInt1K-6               16.0 ± 0%      15.0 ± 0%   -6.25%        (p=0.000 n=10+10)
AvgAggregateSmallInt1K-6          16.0 ± 0%      14.0 ± 0%  -12.50%        (p=0.000 n=10+10)
AvgAggregateFloat1K-6             3.00 ± 0%      4.00 ± 0%  +33.33%        (p=0.000 n=10+10)
AvgAggregateDecimal1K-6          2.97k ± 1%     2.96k ± 1%     ~           (p=0.239 n=10+10)
CountAggregate1K-6                2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SumAggregateInt1K-6               5.00 ± 0%      3.00 ± 0%  -40.00%        (p=0.000 n=10+10)
SumAggregateSmallInt1K-6          5.00 ± 0%      2.00 ± 0%  -60.00%        (p=0.000 n=10+10)
SumAggregateFloat1K-6             2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SumAggregateDecimal1K-6          2.95k ± 1%     2.93k ± 2%     ~           (p=0.128 n=10+10)
MaxAggregateInt1K-6               1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
MaxAggregateFloat1K-6             1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
MaxAggregateDecimal1K-6          1.20k ±17%     1.41k ±58%     ~            (p=0.645 n=9+10)
MinAggregateInt1K-6               1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
MinAggregateFloat1K-6             1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
MinAggregateDecimal1K-6          2.88k ± 5%     2.92k ± 2%     ~            (p=0.305 n=10+9)
VarianceAggregateInt1K-6         17.4k ± 0%     17.4k ± 0%     ~             (p=0.799 n=8+9)
VarianceAggregateFloat1K-6        3.00 ± 0%      2.00 ± 0%  -33.33%        (p=0.000 n=10+10)
VarianceAggregateDecimal1K-6     15.9k ± 1%     16.0k ± 1%     ~            (p=0.905 n=9+10)
StddevAggregateInt1K-6           19.1k ± 0%     19.1k ± 0%     ~           (p=0.288 n=10+10)
StddevAggregateFloat1K-6          4.00 ± 0%      4.00 ± 0%     ~     (all samples are equal)
StddevAggregateDecimal1K-6       16.1k ± 0%     16.3k ± 1%     ~            (p=0.079 n=9+10)

Fixes #8678.
Fixes #8711.

This change is

knz · 2016-08-18T23:58:26Z

@nvanbenschoten @RaduBerinde if you have any suggestion as to why int sum and average become slower, I'll take it. Thanks!

dt · 2016-08-19T16:47:58Z

@knz can you include -benchmem in you benchmarks?

RaduBerinde · 2016-08-19T16:48:35Z

@dt why me? :)

RaduBerinde · 2016-08-19T17:09:12Z

The int may be slower because we used to call inf.Dec.Add and now we are calling decimalSumAggregate.Add which has to make a downcast to *DDecimal. You could add a method decimalSumAggregate.AddDec that takes an inf.Dec directly. Otherwise,

It is really unfortunate we have to go through decimals for all int aggregations. I assume the reason is overflow? We could use int64 and check for overflow on each Add, and only switch to decimals if/when we need to. This doesn't seem very complicated to me, and it would all be internal to intSumAggregate. The same goes for intVarianceAggregate but that's less important. Of course this is perpendicular to this PR, but we should file an issue to do this.

Review status: 0 of 4 files reviewed at latest revision, all discussions resolved, all commit checks successful.

Comments from Reviewable

dt · 2016-08-19T17:25:51Z

on the agg handling side, though I defer to @nvanbenschoten on some of the impl changes since I think he's been in there most recently.

Review status: 0 of 4 files reviewed at latest revision, all discussions resolved, all commit checks successful.

Comments from Reviewable

knz · 2016-08-19T17:44:33Z

Thanks @RaduBerinde for the hint. I have simplified the int sum as you suggest and indeed it thus becomes faster. I also implemented the suggested optimization to use ints for the sum until an overflow occurs, however the overflow check itself is rather expensive (time wise) so there isn't a big difference in the benchmark.

Review status: 0 of 4 files reviewed at latest revision, all discussions resolved, some commit checks pending.

Comments from Reviewable

knz · 2016-08-19T17:45:08Z

Also I put the benchmem results in the commits message as you suggested @dt.

nvanbenschoten · 2016-08-19T19:38:27Z

General approach . Could you add a breif enumeration of the simplifications/optimizations you made in the commit message. Something like:

Specific changes:
- Removed error return values from AggregateFunc interface
- Split aggregators into type specific aggregators and resolve type externally
...

Interestingly, WRT the addition in an int64 for Decimal values when possible, if we end up going forward with #6009, we will get that behavior for free everywhere.

Reviewed 3 of 4 files at r1, 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.

sql/parser/aggregate_builtins.go, line 425 [r2] (raw file):

  t := int64(*datum.(*DInt))
  if t != 0 {
      if !a.large &&

This deserves a comment.

Comments from Reviewable

RaduBerinde · 2016-08-19T19:57:08Z

@knz Nice! Note that the existing benchmarks don't really exercise the optimization (they use random 63 bit integers so they are likely to overflow very soon). Maybe add a SmallInt version of the benchmarks (and a makeSmallIntTestDatum) where we e.g. use integers less than 10000.

Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.

Comments from Reviewable

knz · 2016-08-20T10:31:58Z

@RaduBerinde I have added the small int benchmarks as you suggested and updated the commit message with the new benchmark results. @nvanbenschoten I extended the commit message accordingly.

Review status: 0 of 4 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.

sql/parser/aggregate_builtins.go, line 425 [r2] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This deserves a comment.

Done.

Comments from Reviewable

Prior to this patch, `varianceAggregate`, `sumAggregate` (and indirectly `avg` and `stddev` aggregates which used the former) tried to be "smart" and share code between various argument types. However this actually caused extra complexity and run-time overhead. Indeed, every time the Add method was called: - a type switch was performed on the argument (although every value added to a given variance aggregator will always be of the same type) - a check was made whether an implementation for its argument's type had been instantiated already, to instantiate it if needed (although given the argument type is known after type checking, this instantiation could have been performed prior to the first Add call) This patch addresses this shortcoming by separating implementations by argument type. In addition, minor optimizations were also implemented: - the error return value was removed, since there is no error condition. - the `sum` (and thus `avg`) aggregates now use a single `int64` as long as the sum does not overflow. Also a long-standing bug with COUNT was fixed (the COUNT of a tuple of NULL values is 1, not 0). ``` name old time/op new time/op delta AvgAggregateInt1K-6 57.1µs ± 1% 55.3µs ± 0% -3.27% (p=0.000 n=9+8) AvgAggregateSmallInt1K-6 60.1µs ± 1% 17.8µs ± 2% -70.33% (p=0.000 n=9+10) AvgAggregateFloat1K-6 10.8µs ± 3% 8.8µs ± 1% -19.02% (p=0.000 n=10+9) AvgAggregateDecimal1K-6 252µs ± 3% 249µs ± 3% ~ (p=0.105 n=10+10) CountAggregate1K-6 5.47µs ± 2% 3.83µs ± 1% -29.99% (p=0.000 n=10+10) SumAggregateInt1K-6 52.1µs ± 1% 49.2µs ± 1% -5.59% (p=0.000 n=10+10) SumAggregateSmallInt1K-6 55.0µs ± 1% 13.3µs ± 2% -75.85% (p=0.000 n=10+10) SumAggregateFloat1K-6 7.04µs ± 5% 5.22µs ± 1% -25.79% (p=0.000 n=10+10) SumAggregateDecimal1K-6 246µs ± 2% 237µs ± 2% -3.77% (p=0.000 n=10+10) MaxAggregateInt1K-6 8.92µs ± 1% 8.48µs ± 2% -4.99% (p=0.000 n=8+10) MaxAggregateFloat1K-6 9.26µs ± 2% 9.94µs ± 1% +7.29% (p=0.000 n=10+10) MaxAggregateDecimal1K-6 113µs ±12% 126µs ±43% ~ (p=0.661 n=9+10) MinAggregateInt1K-6 9.52µs ± 1% 9.23µs ± 1% -3.02% (p=0.000 n=9+10) MinAggregateFloat1K-6 9.12µs ± 1% 8.88µs ± 1% -2.57% (p=0.000 n=8+10) MinAggregateDecimal1K-6 223µs ± 5% 224µs ± 2% ~ (p=0.842 n=10+9) VarianceAggregateInt1K-6 1.49ms ± 1% 1.49ms ± 1% ~ (p=0.274 n=10+8) VarianceAggregateFloat1K-6 14.2µs ± 1% 11.7µs ± 1% -17.42% (p=0.000 n=9+10) VarianceAggregateDecimal1K-6 1.36ms ± 2% 1.31ms ± 2% -4.12% (p=0.000 n=10+10) StddevAggregateInt1K-6 1.69ms ± 1% 1.65ms ± 2% -2.47% (p=0.000 n=10+10) StddevAggregateFloat1K-6 15.7µs ± 1% 11.8µs ± 2% -24.81% (p=0.000 n=9+10) StddevAggregateDecimal1K-6 1.38ms ± 1% 1.32ms ± 5% -4.41% (p=0.000 n=9+10) name old alloc/op new alloc/op delta AvgAggregateInt1K-6 772B ± 0% 692B ± 0% -10.36% (p=0.000 n=10+10) AvgAggregateSmallInt1K-6 756B ± 0% 628B ± 0% -16.93% (p=0.000 n=10+10) AvgAggregateFloat1K-6 128B ± 0% 64B ± 0% -50.00% (p=0.000 n=10+10) AvgAggregateDecimal1K-6 142kB ± 1% 142kB ± 1% ~ (p=0.239 n=10+10) CountAggregate1K-6 16.0B ± 0% 16.0B ± 0% ~ (all samples are equal) SumAggregateInt1K-6 304B ± 0% 192B ± 0% -36.84% (p=0.000 n=10+10) SumAggregateSmallInt1K-6 304B ± 0% 144B ± 0% -52.63% (p=0.000 n=10+10) SumAggregateFloat1K-6 120B ± 0% 24B ± 0% -80.00% (p=0.000 n=10+10) SumAggregateDecimal1K-6 142kB ± 1% 141kB ± 2% ~ (p=0.123 n=10+10) MaxAggregateInt1K-6 16.0B ± 0% 16.0B ± 0% ~ (all samples are equal) MaxAggregateFloat1K-6 16.0B ± 0% 16.0B ± 0% ~ (all samples are equal) MaxAggregateDecimal1K-6 57.7kB ±17% 67.7kB ±58% ~ (p=0.645 n=9+10) MinAggregateInt1K-6 16.0B ± 0% 16.0B ± 0% ~ (all samples are equal) MinAggregateFloat1K-6 16.0B ± 0% 16.0B ± 0% ~ (all samples are equal) MinAggregateDecimal1K-6 138kB ± 5% 140kB ± 2% ~ (p=0.305 n=10+9) VarianceAggregateInt1K-6 773kB ± 0% 773kB ± 0% ~ (p=0.743 n=8+9) VarianceAggregateFloat1K-6 104B ± 0% 40B ± 0% -61.54% (p=0.000 n=10+10) VarianceAggregateDecimal1K-6 709kB ± 2% 706kB ± 1% ~ (p=0.436 n=10+10) StddevAggregateInt1K-6 869kB ± 0% 868kB ± 0% ~ (p=0.128 n=9+10) StddevAggregateFloat1K-6 112B ± 0% 64B ± 0% -42.86% (p=0.000 n=10+10) StddevAggregateDecimal1K-6 716kB ± 1% 723kB ± 2% +1.06% (p=0.028 n=9+10) name old allocs/op new allocs/op delta AvgAggregateInt1K-6 16.0 ± 0% 15.0 ± 0% -6.25% (p=0.000 n=10+10) AvgAggregateSmallInt1K-6 16.0 ± 0% 14.0 ± 0% -12.50% (p=0.000 n=10+10) AvgAggregateFloat1K-6 3.00 ± 0% 4.00 ± 0% +33.33% (p=0.000 n=10+10) AvgAggregateDecimal1K-6 2.97k ± 1% 2.96k ± 1% ~ (p=0.239 n=10+10) CountAggregate1K-6 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) SumAggregateInt1K-6 5.00 ± 0% 3.00 ± 0% -40.00% (p=0.000 n=10+10) SumAggregateSmallInt1K-6 5.00 ± 0% 2.00 ± 0% -60.00% (p=0.000 n=10+10) SumAggregateFloat1K-6 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) SumAggregateDecimal1K-6 2.95k ± 1% 2.93k ± 2% ~ (p=0.128 n=10+10) MaxAggregateInt1K-6 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) MaxAggregateFloat1K-6 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) MaxAggregateDecimal1K-6 1.20k ±17% 1.41k ±58% ~ (p=0.645 n=9+10) MinAggregateInt1K-6 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) MinAggregateFloat1K-6 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) MinAggregateDecimal1K-6 2.88k ± 5% 2.92k ± 2% ~ (p=0.305 n=10+9) VarianceAggregateInt1K-6 17.4k ± 0% 17.4k ± 0% ~ (p=0.799 n=8+9) VarianceAggregateFloat1K-6 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10) VarianceAggregateDecimal1K-6 15.9k ± 1% 16.0k ± 1% ~ (p=0.905 n=9+10) StddevAggregateInt1K-6 19.1k ± 0% 19.1k ± 0% ~ (p=0.288 n=10+10) StddevAggregateFloat1K-6 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) StddevAggregateDecimal1K-6 16.1k ± 0% 16.3k ± 1% ~ (p=0.079 n=9+10) ```

tamird · 2016-08-20T12:04:24Z

sql/parser/aggregate_builtins.go

 }

 type avgAggregate struct {
-	sumAggregate
+	agg   AggregateFunc


why change this to an interface type? ISTM that this will cause dynamic dispatch where static dispatch would otherwise be possible.

oh, because there are multiple summing aggregates now. disregard!

RaduBerinde · 2016-08-20T12:56:21Z

Nice!

Comments from Reviewable

knz force-pushed the simplify-variance branch 3 times, most recently from fe64f53 to d8a53a0 Compare August 19, 2016 07:47

knz assigned dt Aug 19, 2016

knz force-pushed the simplify-variance branch from d8a53a0 to 70c4f33 Compare August 19, 2016 17:41

knz force-pushed the simplify-variance branch from 70c4f33 to 5cfa7b0 Compare August 19, 2016 17:55

knz mentioned this pull request Aug 20, 2016

sql: fix COUNT on tuples #8711

Closed

sql: add an aggregate test for smaller integers.

e4892bf

knz force-pushed the simplify-variance branch from 5cfa7b0 to 2204ef0 Compare August 20, 2016 10:30

knz force-pushed the simplify-variance branch from 2204ef0 to b15781d Compare August 20, 2016 11:08

knz changed the title ~~sql: simplify/optimize the aggregate functions.~~ sql: simplify/optimize/fix the aggregate functions. Aug 20, 2016

knz force-pushed the simplify-variance branch from b15781d to 72a5c19 Compare August 20, 2016 11:33

knz merged commit 2fe16b5 into cockroachdb:develop Aug 20, 2016

knz deleted the simplify-variance branch August 20, 2016 11:59

tamird reviewed Aug 20, 2016
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: simplify/optimize/fix the aggregate functions. #8680

sql: simplify/optimize/fix the aggregate functions. #8680

knz commented Aug 18, 2016 •

edited

Loading

knz commented Aug 18, 2016

dt commented Aug 19, 2016 •

edited

Loading

RaduBerinde commented Aug 19, 2016

RaduBerinde commented Aug 19, 2016 •

edited

Loading

dt commented Aug 19, 2016

knz commented Aug 19, 2016

knz commented Aug 19, 2016

nvanbenschoten commented Aug 19, 2016

RaduBerinde commented Aug 19, 2016

knz commented Aug 20, 2016

tamird Aug 20, 2016

tamird Aug 20, 2016

RaduBerinde commented Aug 20, 2016

sql: simplify/optimize/fix the aggregate functions. #8680

sql: simplify/optimize/fix the aggregate functions. #8680

Conversation

knz commented Aug 18, 2016 • edited Loading

knz commented Aug 18, 2016

dt commented Aug 19, 2016 • edited Loading

RaduBerinde commented Aug 19, 2016

RaduBerinde commented Aug 19, 2016 • edited Loading

dt commented Aug 19, 2016

knz commented Aug 19, 2016

knz commented Aug 19, 2016

nvanbenschoten commented Aug 19, 2016

RaduBerinde commented Aug 19, 2016

knz commented Aug 20, 2016

tamird Aug 20, 2016

Choose a reason for hiding this comment

tamird Aug 20, 2016

Choose a reason for hiding this comment

RaduBerinde commented Aug 20, 2016

knz commented Aug 18, 2016 •

edited

Loading

dt commented Aug 19, 2016 •

edited

Loading

RaduBerinde commented Aug 19, 2016 •

edited

Loading