Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: simplify/optimize/fix the aggregate functions. #8680

Merged
merged 2 commits into from
Aug 20, 2016

Conversation

knz
Copy link
Contributor

@knz knz commented Aug 18, 2016

Prior to this patch, varianceAggregate, sumAggregate (and
indirectly avg and stddev aggregates which used the former) tried
to be "smart" and share code between various argument types. However
this actually caused extra complexity and run-time overhead. Indeed,
every time the Add method was called:

  • a type switch was performed on the argument (although every value
    added to a given variance aggregator will always be of the same
    type)
  • a check was made whether an implementation for its argument's type
    had been instantiated already, to instantiate it if needed (although
    given the argument type is known after type checking, this
    instantiation could have been performed prior to the first Add call)

This patch addresses this shortcoming by separating implementations by
argument type.

In addition, minor optimizations were also implemented:

  • the error return value was removed, since there is no error condition.
  • the sum (and thus avg) aggregates now use a single int64 as
    long as the sum does not overflow.

Also a long-standing bug with COUNT was fixed (the COUNT of a tuple of
NULL values is 1, not 0).

name                          old time/op    new time/op    delta
AvgAggregateInt1K-6             57.1µs ± 1%    55.3µs ± 0%   -3.27%          (p=0.000 n=9+8)
AvgAggregateSmallInt1K-6        60.1µs ± 1%    17.8µs ± 2%  -70.33%         (p=0.000 n=9+10)
AvgAggregateFloat1K-6           10.8µs ± 3%     8.8µs ± 1%  -19.02%         (p=0.000 n=10+9)
AvgAggregateDecimal1K-6          252µs ± 3%     249µs ± 3%     ~           (p=0.105 n=10+10)
CountAggregate1K-6              5.47µs ± 2%    3.83µs ± 1%  -29.99%        (p=0.000 n=10+10)
SumAggregateInt1K-6             52.1µs ± 1%    49.2µs ± 1%   -5.59%        (p=0.000 n=10+10)
SumAggregateSmallInt1K-6        55.0µs ± 1%    13.3µs ± 2%  -75.85%        (p=0.000 n=10+10)
SumAggregateFloat1K-6           7.04µs ± 5%    5.22µs ± 1%  -25.79%        (p=0.000 n=10+10)
SumAggregateDecimal1K-6          246µs ± 2%     237µs ± 2%   -3.77%        (p=0.000 n=10+10)
MaxAggregateInt1K-6             8.92µs ± 1%    8.48µs ± 2%   -4.99%         (p=0.000 n=8+10)
MaxAggregateFloat1K-6           9.26µs ± 2%    9.94µs ± 1%   +7.29%        (p=0.000 n=10+10)
MaxAggregateDecimal1K-6          113µs ±12%     126µs ±43%     ~            (p=0.661 n=9+10)
MinAggregateInt1K-6             9.52µs ± 1%    9.23µs ± 1%   -3.02%         (p=0.000 n=9+10)
MinAggregateFloat1K-6           9.12µs ± 1%    8.88µs ± 1%   -2.57%         (p=0.000 n=8+10)
MinAggregateDecimal1K-6          223µs ± 5%     224µs ± 2%     ~            (p=0.842 n=10+9)
VarianceAggregateInt1K-6        1.49ms ± 1%    1.49ms ± 1%     ~            (p=0.274 n=10+8)
VarianceAggregateFloat1K-6      14.2µs ± 1%    11.7µs ± 1%  -17.42%         (p=0.000 n=9+10)
VarianceAggregateDecimal1K-6    1.36ms ± 2%    1.31ms ± 2%   -4.12%        (p=0.000 n=10+10)
StddevAggregateInt1K-6          1.69ms ± 1%    1.65ms ± 2%   -2.47%        (p=0.000 n=10+10)
StddevAggregateFloat1K-6        15.7µs ± 1%    11.8µs ± 2%  -24.81%         (p=0.000 n=9+10)
StddevAggregateDecimal1K-6      1.38ms ± 1%    1.32ms ± 5%   -4.41%         (p=0.000 n=9+10)

name                          old alloc/op   new alloc/op   delta
AvgAggregateInt1K-6               772B ± 0%      692B ± 0%  -10.36%        (p=0.000 n=10+10)
AvgAggregateSmallInt1K-6          756B ± 0%      628B ± 0%  -16.93%        (p=0.000 n=10+10)
AvgAggregateFloat1K-6             128B ± 0%       64B ± 0%  -50.00%        (p=0.000 n=10+10)
AvgAggregateDecimal1K-6          142kB ± 1%     142kB ± 1%     ~           (p=0.239 n=10+10)
CountAggregate1K-6               16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
SumAggregateInt1K-6               304B ± 0%      192B ± 0%  -36.84%        (p=0.000 n=10+10)
SumAggregateSmallInt1K-6          304B ± 0%      144B ± 0%  -52.63%        (p=0.000 n=10+10)
SumAggregateFloat1K-6             120B ± 0%       24B ± 0%  -80.00%        (p=0.000 n=10+10)
SumAggregateDecimal1K-6          142kB ± 1%     141kB ± 2%     ~           (p=0.123 n=10+10)
MaxAggregateInt1K-6              16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
MaxAggregateFloat1K-6            16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
MaxAggregateDecimal1K-6         57.7kB ±17%    67.7kB ±58%     ~            (p=0.645 n=9+10)
MinAggregateInt1K-6              16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
MinAggregateFloat1K-6            16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
MinAggregateDecimal1K-6          138kB ± 5%     140kB ± 2%     ~            (p=0.305 n=10+9)
VarianceAggregateInt1K-6         773kB ± 0%     773kB ± 0%     ~             (p=0.743 n=8+9)
VarianceAggregateFloat1K-6        104B ± 0%       40B ± 0%  -61.54%        (p=0.000 n=10+10)
VarianceAggregateDecimal1K-6     709kB ± 2%     706kB ± 1%     ~           (p=0.436 n=10+10)
StddevAggregateInt1K-6           869kB ± 0%     868kB ± 0%     ~            (p=0.128 n=9+10)
StddevAggregateFloat1K-6          112B ± 0%       64B ± 0%  -42.86%        (p=0.000 n=10+10)
StddevAggregateDecimal1K-6       716kB ± 1%     723kB ± 2%   +1.06%         (p=0.028 n=9+10)

name                          old allocs/op  new allocs/op  delta
AvgAggregateInt1K-6               16.0 ± 0%      15.0 ± 0%   -6.25%        (p=0.000 n=10+10)
AvgAggregateSmallInt1K-6          16.0 ± 0%      14.0 ± 0%  -12.50%        (p=0.000 n=10+10)
AvgAggregateFloat1K-6             3.00 ± 0%      4.00 ± 0%  +33.33%        (p=0.000 n=10+10)
AvgAggregateDecimal1K-6          2.97k ± 1%     2.96k ± 1%     ~           (p=0.239 n=10+10)
CountAggregate1K-6                2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SumAggregateInt1K-6               5.00 ± 0%      3.00 ± 0%  -40.00%        (p=0.000 n=10+10)
SumAggregateSmallInt1K-6          5.00 ± 0%      2.00 ± 0%  -60.00%        (p=0.000 n=10+10)
SumAggregateFloat1K-6             2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SumAggregateDecimal1K-6          2.95k ± 1%     2.93k ± 2%     ~           (p=0.128 n=10+10)
MaxAggregateInt1K-6               1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
MaxAggregateFloat1K-6             1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
MaxAggregateDecimal1K-6          1.20k ±17%     1.41k ±58%     ~            (p=0.645 n=9+10)
MinAggregateInt1K-6               1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
MinAggregateFloat1K-6             1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
MinAggregateDecimal1K-6          2.88k ± 5%     2.92k ± 2%     ~            (p=0.305 n=10+9)
VarianceAggregateInt1K-6         17.4k ± 0%     17.4k ± 0%     ~             (p=0.799 n=8+9)
VarianceAggregateFloat1K-6        3.00 ± 0%      2.00 ± 0%  -33.33%        (p=0.000 n=10+10)
VarianceAggregateDecimal1K-6     15.9k ± 1%     16.0k ± 1%     ~            (p=0.905 n=9+10)
StddevAggregateInt1K-6           19.1k ± 0%     19.1k ± 0%     ~           (p=0.288 n=10+10)
StddevAggregateFloat1K-6          4.00 ± 0%      4.00 ± 0%     ~     (all samples are equal)
StddevAggregateDecimal1K-6       16.1k ± 0%     16.3k ± 1%     ~            (p=0.079 n=9+10)

Fixes #8678.
Fixes #8711.


This change is Reviewable

@knz
Copy link
Contributor Author

knz commented Aug 18, 2016

@nvanbenschoten @RaduBerinde if you have any suggestion as to why int sum and average become slower, I'll take it. Thanks!

@knz knz force-pushed the simplify-variance branch 3 times, most recently from fe64f53 to d8a53a0 Compare August 19, 2016 07:47
@knz knz assigned dt Aug 19, 2016
@dt
Copy link
Member

dt commented Aug 19, 2016

@knz can you include -benchmem in you benchmarks?

@RaduBerinde
Copy link
Member

@dt why me? :)

@RaduBerinde
Copy link
Member

RaduBerinde commented Aug 19, 2016

The int may be slower because we used to call inf.Dec.Add and now we are calling decimalSumAggregate.Add which has to make a downcast to *DDecimal. You could add a method decimalSumAggregate.AddDec that takes an inf.Dec directly. Otherwise, :lgtm:

It is really unfortunate we have to go through decimals for all int aggregations. I assume the reason is overflow? We could use int64 and check for overflow on each Add, and only switch to decimals if/when we need to. This doesn't seem very complicated to me, and it would all be internal to intSumAggregate. The same goes for intVarianceAggregate but that's less important. Of course this is perpendicular to this PR, but we should file an issue to do this.


Review status: 0 of 4 files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@dt
Copy link
Member

dt commented Aug 19, 2016

:lgtm: on the agg handling side, though I defer to @nvanbenschoten on some of the impl changes since I think he's been in there most recently.


Review status: 0 of 4 files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented Aug 19, 2016

Thanks @RaduBerinde for the hint. I have simplified the int sum as you suggest and indeed it thus becomes faster. I also implemented the suggested optimization to use ints for the sum until an overflow occurs, however the overflow check itself is rather expensive (time wise) so there isn't a big difference in the benchmark.


Review status: 0 of 4 files reviewed at latest revision, all discussions resolved, some commit checks pending.


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented Aug 19, 2016

Also I put the benchmem results in the commits message as you suggested @dt.

@nvanbenschoten
Copy link
Member

General approach :lgtm:. Could you add a breif enumeration of the simplifications/optimizations you made in the commit message. Something like:

Specific changes:
- Removed error return values from AggregateFunc interface
- Split aggregators into type specific aggregators and resolve type externally
... 

Interestingly, WRT the addition in an int64 for Decimal values when possible, if we end up going forward with #6009, we will get that behavior for free everywhere.


Reviewed 3 of 4 files at r1, 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


sql/parser/aggregate_builtins.go, line 425 [r2] (raw file):

  t := int64(*datum.(*DInt))
  if t != 0 {
      if !a.large &&

This deserves a comment.


Comments from Reviewable

@RaduBerinde
Copy link
Member

@knz Nice! Note that the existing benchmarks don't really exercise the optimization (they use random 63 bit integers so they are likely to overflow very soon). Maybe add a SmallInt version of the benchmarks (and a makeSmallIntTestDatum) where we e.g. use integers less than 10000.


Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


Comments from Reviewable

@knz knz mentioned this pull request Aug 20, 2016
@knz
Copy link
Contributor Author

knz commented Aug 20, 2016

@RaduBerinde I have added the small int benchmarks as you suggested and updated the commit message with the new benchmark results. @nvanbenschoten I extended the commit message accordingly.


Review status: 0 of 4 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


sql/parser/aggregate_builtins.go, line 425 [r2] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This deserves a comment.

Done.

Comments from Reviewable

@knz knz changed the title sql: simplify/optimize the aggregate functions. sql: simplify/optimize/fix the aggregate functions. Aug 20, 2016
Prior to this patch, `varianceAggregate`, `sumAggregate` (and
indirectly `avg` and `stddev` aggregates which used the former) tried
to be "smart" and share code between various argument types. However
this actually caused extra complexity and run-time overhead. Indeed,
every time the Add method was called:

- a type switch was performed on the argument (although every value
  added to a given variance aggregator will always be of the same
  type)

- a check was made whether an implementation for its argument's type
  had been instantiated already, to instantiate it if needed (although
  given the argument type is known after type checking, this
  instantiation could have been performed prior to the first Add call)

This patch addresses this shortcoming by separating implementations by
argument type.

In addition, minor optimizations were also implemented:

- the error return value was removed, since there is no error condition.
- the `sum` (and thus `avg`) aggregates now use a single `int64` as
  long as the sum does not overflow.

Also a long-standing bug with COUNT was fixed (the COUNT of a tuple of
NULL values is 1, not 0).

```
name                          old time/op    new time/op    delta
AvgAggregateInt1K-6             57.1µs ± 1%    55.3µs ± 0%   -3.27%          (p=0.000 n=9+8)
AvgAggregateSmallInt1K-6        60.1µs ± 1%    17.8µs ± 2%  -70.33%         (p=0.000 n=9+10)
AvgAggregateFloat1K-6           10.8µs ± 3%     8.8µs ± 1%  -19.02%         (p=0.000 n=10+9)
AvgAggregateDecimal1K-6          252µs ± 3%     249µs ± 3%     ~           (p=0.105 n=10+10)
CountAggregate1K-6              5.47µs ± 2%    3.83µs ± 1%  -29.99%        (p=0.000 n=10+10)
SumAggregateInt1K-6             52.1µs ± 1%    49.2µs ± 1%   -5.59%        (p=0.000 n=10+10)
SumAggregateSmallInt1K-6        55.0µs ± 1%    13.3µs ± 2%  -75.85%        (p=0.000 n=10+10)
SumAggregateFloat1K-6           7.04µs ± 5%    5.22µs ± 1%  -25.79%        (p=0.000 n=10+10)
SumAggregateDecimal1K-6          246µs ± 2%     237µs ± 2%   -3.77%        (p=0.000 n=10+10)
MaxAggregateInt1K-6             8.92µs ± 1%    8.48µs ± 2%   -4.99%         (p=0.000 n=8+10)
MaxAggregateFloat1K-6           9.26µs ± 2%    9.94µs ± 1%   +7.29%        (p=0.000 n=10+10)
MaxAggregateDecimal1K-6          113µs ±12%     126µs ±43%     ~            (p=0.661 n=9+10)
MinAggregateInt1K-6             9.52µs ± 1%    9.23µs ± 1%   -3.02%         (p=0.000 n=9+10)
MinAggregateFloat1K-6           9.12µs ± 1%    8.88µs ± 1%   -2.57%         (p=0.000 n=8+10)
MinAggregateDecimal1K-6          223µs ± 5%     224µs ± 2%     ~            (p=0.842 n=10+9)
VarianceAggregateInt1K-6        1.49ms ± 1%    1.49ms ± 1%     ~            (p=0.274 n=10+8)
VarianceAggregateFloat1K-6      14.2µs ± 1%    11.7µs ± 1%  -17.42%         (p=0.000 n=9+10)
VarianceAggregateDecimal1K-6    1.36ms ± 2%    1.31ms ± 2%   -4.12%        (p=0.000 n=10+10)
StddevAggregateInt1K-6          1.69ms ± 1%    1.65ms ± 2%   -2.47%        (p=0.000 n=10+10)
StddevAggregateFloat1K-6        15.7µs ± 1%    11.8µs ± 2%  -24.81%         (p=0.000 n=9+10)
StddevAggregateDecimal1K-6      1.38ms ± 1%    1.32ms ± 5%   -4.41%         (p=0.000 n=9+10)

name                          old alloc/op   new alloc/op   delta
AvgAggregateInt1K-6               772B ± 0%      692B ± 0%  -10.36%        (p=0.000 n=10+10)
AvgAggregateSmallInt1K-6          756B ± 0%      628B ± 0%  -16.93%        (p=0.000 n=10+10)
AvgAggregateFloat1K-6             128B ± 0%       64B ± 0%  -50.00%        (p=0.000 n=10+10)
AvgAggregateDecimal1K-6          142kB ± 1%     142kB ± 1%     ~           (p=0.239 n=10+10)
CountAggregate1K-6               16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
SumAggregateInt1K-6               304B ± 0%      192B ± 0%  -36.84%        (p=0.000 n=10+10)
SumAggregateSmallInt1K-6          304B ± 0%      144B ± 0%  -52.63%        (p=0.000 n=10+10)
SumAggregateFloat1K-6             120B ± 0%       24B ± 0%  -80.00%        (p=0.000 n=10+10)
SumAggregateDecimal1K-6          142kB ± 1%     141kB ± 2%     ~           (p=0.123 n=10+10)
MaxAggregateInt1K-6              16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
MaxAggregateFloat1K-6            16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
MaxAggregateDecimal1K-6         57.7kB ±17%    67.7kB ±58%     ~            (p=0.645 n=9+10)
MinAggregateInt1K-6              16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
MinAggregateFloat1K-6            16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
MinAggregateDecimal1K-6          138kB ± 5%     140kB ± 2%     ~            (p=0.305 n=10+9)
VarianceAggregateInt1K-6         773kB ± 0%     773kB ± 0%     ~             (p=0.743 n=8+9)
VarianceAggregateFloat1K-6        104B ± 0%       40B ± 0%  -61.54%        (p=0.000 n=10+10)
VarianceAggregateDecimal1K-6     709kB ± 2%     706kB ± 1%     ~           (p=0.436 n=10+10)
StddevAggregateInt1K-6           869kB ± 0%     868kB ± 0%     ~            (p=0.128 n=9+10)
StddevAggregateFloat1K-6          112B ± 0%       64B ± 0%  -42.86%        (p=0.000 n=10+10)
StddevAggregateDecimal1K-6       716kB ± 1%     723kB ± 2%   +1.06%         (p=0.028 n=9+10)

name                          old allocs/op  new allocs/op  delta
AvgAggregateInt1K-6               16.0 ± 0%      15.0 ± 0%   -6.25%        (p=0.000 n=10+10)
AvgAggregateSmallInt1K-6          16.0 ± 0%      14.0 ± 0%  -12.50%        (p=0.000 n=10+10)
AvgAggregateFloat1K-6             3.00 ± 0%      4.00 ± 0%  +33.33%        (p=0.000 n=10+10)
AvgAggregateDecimal1K-6          2.97k ± 1%     2.96k ± 1%     ~           (p=0.239 n=10+10)
CountAggregate1K-6                2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SumAggregateInt1K-6               5.00 ± 0%      3.00 ± 0%  -40.00%        (p=0.000 n=10+10)
SumAggregateSmallInt1K-6          5.00 ± 0%      2.00 ± 0%  -60.00%        (p=0.000 n=10+10)
SumAggregateFloat1K-6             2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SumAggregateDecimal1K-6          2.95k ± 1%     2.93k ± 2%     ~           (p=0.128 n=10+10)
MaxAggregateInt1K-6               1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
MaxAggregateFloat1K-6             1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
MaxAggregateDecimal1K-6          1.20k ±17%     1.41k ±58%     ~            (p=0.645 n=9+10)
MinAggregateInt1K-6               1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
MinAggregateFloat1K-6             1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
MinAggregateDecimal1K-6          2.88k ± 5%     2.92k ± 2%     ~            (p=0.305 n=10+9)
VarianceAggregateInt1K-6         17.4k ± 0%     17.4k ± 0%     ~             (p=0.799 n=8+9)
VarianceAggregateFloat1K-6        3.00 ± 0%      2.00 ± 0%  -33.33%        (p=0.000 n=10+10)
VarianceAggregateDecimal1K-6     15.9k ± 1%     16.0k ± 1%     ~            (p=0.905 n=9+10)
StddevAggregateInt1K-6           19.1k ± 0%     19.1k ± 0%     ~           (p=0.288 n=10+10)
StddevAggregateFloat1K-6          4.00 ± 0%      4.00 ± 0%     ~     (all samples are equal)
StddevAggregateDecimal1K-6       16.1k ± 0%     16.3k ± 1%     ~            (p=0.079 n=9+10)
```
@knz knz merged commit 2fe16b5 into cockroachdb:develop Aug 20, 2016
@knz knz deleted the simplify-variance branch August 20, 2016 11:59
}

type avgAggregate struct {
sumAggregate
agg AggregateFunc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change this to an interface type? ISTM that this will cause dynamic dispatch where static dispatch would otherwise be possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, because there are multiple summing aggregates now. disregard!

@RaduBerinde
Copy link
Member

Nice! :lgtm_strong:


Comments from Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants