Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48760][SQL] Introduce ALTER TABLE ... CLUSTER BY SQL syntax to change clustering columns #47156

Closed
wants to merge 5 commits into from

Conversation

zedtang
Copy link
Contributor

@zedtang zedtang commented Jul 1, 2024

What changes were proposed in this pull request?

Introduce ALTER TABLE ... CLUSTER BY SQL syntax to change the clustering columns:

ALTER TABLE tbl CLUSTER BY (a, b);  -- update clustering columns to a and b
ALTER TABLE tbl CLUSTER BY NONE;  -- remove clustering columns

This change updates the clustering columns for catalogs to utilize. Clustering columns are maintained in:

  • CatalogTable's PROP_CLUSTERING_COLUMNS for session catalog
  • Table's partitioning transform array for V2 catalog

which is consistent with CREATE TABLE CLUSTER BY( #42577).

Why are the changes needed?

Provides a way to update the clustering columns.

Does this PR introduce any user-facing change?

Yes, it introduces new SQL syntax and a new keyword NONE.

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Jul 1, 2024
@zedtang
Copy link
Contributor Author

zedtang commented Jul 1, 2024

@cloud-fan, @imback82, this PR is ready for review, thanks!

@zedtang zedtang changed the title [SPARK-48760] Introduce ALTER TABLE CLUSTER BY SQL syntax to change clustering columns [SPARK-48760][SQL] Introduce ALTER TABLE CLUSTER BY SQL syntax to change clustering columns Jul 1, 2024
@github-actions github-actions bot added the DOCS label Jul 1, 2024
@zedtang zedtang changed the title [SPARK-48760][SQL] Introduce ALTER TABLE CLUSTER BY SQL syntax to change clustering columns [SPARK-48760][SQL] Introduce ALTER TABLE ... CLUSTER BY SQL syntax to change clustering columns Jul 1, 2024
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in b8cc91c Jul 3, 2024
cloud-fan pushed a commit that referenced this pull request Jul 9, 2024
…yntax-ddl-alter-table.md`

### What changes were proposed in this pull request?
The pr is  following up #47156, aims to
- add `CLUSTER BY` to doc `sql-ref-syntax-ddl-alter-table.md`
- move parser tests from `o.a.s.s.c.p.DDLParserSuite` to `AlterTableClusterByParserSuite`
- use `checkError` to check exception in `o.a.s.s.e.c.AlterTableClusterBySuiteBase`

### Why are the changes needed?
- Enable the doc `sql-ref-syntax-ddl-alter-table.md` to cover new syntax `ALTER TABLE ... CLUSTER BY ...`.
- Align with other similar tests, eg: AlterTableRename*

### Does this PR introduce _any_ user-facing change?
Yes, Make end-users can query the explanation of `CLUSTER BY` through the doc `sql-ref-syntax-ddl-alter-table.md`.

### How was this patch tested?
Updated UT.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47254 from panbingkun/SPARK-48760_FOLLOWUP.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Jul 10, 2024
… change clustering columns

### What changes were proposed in this pull request?

Introduce ALTER TABLE ... CLUSTER BY SQL syntax to change the clustering columns:
```sql
ALTER TABLE tbl CLUSTER BY (a, b);  -- update clustering columns to a and b
ALTER TABLE tbl CLUSTER BY NONE;  -- remove clustering columns
```

This change updates the clustering columns for catalogs to utilize. Clustering columns are maintained in:
* CatalogTable's `PROP_CLUSTERING_COLUMNS` for session catalog
* Table's `partitioning` transform array for V2 catalog

which is consistent with CREATE TABLE CLUSTER BY( apache#42577).

### Why are the changes needed?

Provides a way to update the clustering columns.

### Does this PR introduce _any_ user-facing change?

Yes, it introduces new SQL syntax and a new keyword NONE.

### How was this patch tested?

New unit tests.
### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47156 from zedtang/alter-table-cluster-by.

Lead-authored-by: Jiaheng Tang <jiaheng.tang@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Jul 10, 2024
…yntax-ddl-alter-table.md`

### What changes were proposed in this pull request?
The pr is  following up apache#47156, aims to
- add `CLUSTER BY` to doc `sql-ref-syntax-ddl-alter-table.md`
- move parser tests from `o.a.s.s.c.p.DDLParserSuite` to `AlterTableClusterByParserSuite`
- use `checkError` to check exception in `o.a.s.s.e.c.AlterTableClusterBySuiteBase`

### Why are the changes needed?
- Enable the doc `sql-ref-syntax-ddl-alter-table.md` to cover new syntax `ALTER TABLE ... CLUSTER BY ...`.
- Align with other similar tests, eg: AlterTableRename*

### Does this PR introduce _any_ user-facing change?
Yes, Make end-users can query the explanation of `CLUSTER BY` through the doc `sql-ref-syntax-ddl-alter-table.md`.

### How was this patch tested?
Updated UT.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47254 from panbingkun/SPARK-48760_FOLLOWUP.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@zedtang zedtang deleted the alter-table-cluster-by branch July 11, 2024 16:06
cloud-fan pushed a commit that referenced this pull request Jul 12, 2024
### What changes were proposed in this pull request?

#47156 introduced a bug in `CatalogV2Util.applyClusterByChanges` that it will remove the existing `ClusterByTransform` first, regardless of whether there is a `ClusterBy` table change. This means any table change will remove the clustering columns from the table.

This PR fixes the bug by removing the `ClusterByTransform` only when there is a `ClusterBy` table change.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

Amend existing test to catch this bug.
### Was this patch authored or co-authored using generative AI tooling?

No

Closes #47288 from zedtang/fix-apply-cluster-by-changes.

Authored-by: Jiaheng Tang <jiaheng.tang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
…yntax-ddl-alter-table.md`

### What changes were proposed in this pull request?
The pr is  following up apache#47156, aims to
- add `CLUSTER BY` to doc `sql-ref-syntax-ddl-alter-table.md`
- move parser tests from `o.a.s.s.c.p.DDLParserSuite` to `AlterTableClusterByParserSuite`
- use `checkError` to check exception in `o.a.s.s.e.c.AlterTableClusterBySuiteBase`

### Why are the changes needed?
- Enable the doc `sql-ref-syntax-ddl-alter-table.md` to cover new syntax `ALTER TABLE ... CLUSTER BY ...`.
- Align with other similar tests, eg: AlterTableRename*

### Does this PR introduce _any_ user-facing change?
Yes, Make end-users can query the explanation of `CLUSTER BY` through the doc `sql-ref-syntax-ddl-alter-table.md`.

### How was this patch tested?
Updated UT.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47254 from panbingkun/SPARK-48760_FOLLOWUP.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
### What changes were proposed in this pull request?

apache#47156 introduced a bug in `CatalogV2Util.applyClusterByChanges` that it will remove the existing `ClusterByTransform` first, regardless of whether there is a `ClusterBy` table change. This means any table change will remove the clustering columns from the table.

This PR fixes the bug by removing the `ClusterByTransform` only when there is a `ClusterBy` table change.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

Amend existing test to catch this bug.
### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47288 from zedtang/fix-apply-cluster-by-changes.

Authored-by: Jiaheng Tang <jiaheng.tang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants