Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Underlying Grouping Sets #42631

Closed
AilinKid opened this issue Mar 28, 2023 · 0 comments · Fixed by #46906, #54536, #54962 or #55024
Closed

Implement Underlying Grouping Sets #42631

AilinKid opened this issue Mar 28, 2023 · 0 comments · Fixed by #46906, #54536, #54962 or #55024
Assignees
Labels
sig/execution SIG execution sig/planner SIG: Planner type/feature-request Categorizes issue or PR as related to a new feature.

Comments

@AilinKid
Copy link
Contributor

AilinKid commented Mar 28, 2023

Feature Request

Is your feature request related to a problem? Please describe:

Grouping Sets is internal implementation mechanism for supporting Multi-Distinct-Aggregate MPP Optimization and Rollup/Cube syntax.

SELECT a, b, sum(expression) FROM table GROUP BY a, b With Rollup;

For modern databases like Spark SQL, it allows user to explicitly describe wanted grouping sets explicitly like this:

SELECT a, b, sum(expression) FROM table GROUP BY a, b GROUPING SETS((a,b),(x,x),(...));

Different listed grouping sets/grouping layout requirement above will ask the underlying data to be expanded as multi copies to feed different requirement of Aggregation granularity. As a consequence, the leveled-aggregated result rows will be a union to user.

Apart from explicit requirement from sql syntax level, there is a another way to implicitly describe a composed grouping sets. That's what exactly rollup and cube syntax does. For more detail about a example like rollup(a,b,c), it has implicit N = 4 grouping sets derived from incremental expression composition, such as grouping sets (), (a),(a,b),(a,b,c), so does cube syntax which will be more complicated one.

For Multi Distinct Aggregate case like

select count(distinct a), count(distinct b) from t

distinct nature require a implement of aggregation on groups grouped by a or b here, while single one copy of data can't satisfied both grouping by a and grouping b synchronously. As a consequence, we resort to different grouping sets like (a) and (b) to ask the underlying data to be expanded to feed different aggregation vertically.

Both of the 3 cases above is dependent/based on the implementation of Grouping Sets and Expand Operator, so that's why this issue calls for.

Describe the feature you'd like:

Shown above.

Describe alternatives you've considered:

For Rollup Syntax workaround, rewrite the SQL as union of many sub-query with individual group by items.
For Multi Distinct Aggregate Optimization workaround, there is no way to migrate the computation task to multi mpp nodes.

Teachability, Documentation, Adoption, Migration Strategy:

related issues schedule

Underlying Grouping Sets and Expand Operator

Rollup Syntax

Infra Support

Plan Operator

Bug Fix

@AilinKid AilinKid added type/feature-request Categorizes issue or PR as related to a new feature. sig/planner SIG: Planner sig/execution SIG execution labels Mar 28, 2023
ti-chi-bot bot pushed a commit that referenced this issue Jul 2, 2024
ti-chi-bot bot pushed a commit that referenced this issue Jul 24, 2024
ti-chi-bot bot pushed a commit that referenced this issue Jul 26, 2024
hawkingrei pushed a commit to hawkingrei/tidb that referenced this issue Aug 1, 2024
hawkingrei pushed a commit to hawkingrei/tidb that referenced this issue Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment