Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48760][SQL] Fix CatalogV2Util.applyClusterByChanges #47288

Closed
wants to merge 1 commit into from

Conversation

zedtang
Copy link
Contributor

@zedtang zedtang commented Jul 10, 2024

What changes were proposed in this pull request?

#47156 introduced a bug in CatalogV2Util.applyClusterByChanges that it will remove the existing ClusterByTransform first, regardless of whether there is a ClusterBy table change. This means any table change will remove the clustering columns from the table.

This PR fixes the bug by removing the ClusterByTransform only when there is a ClusterBy table change.

Why are the changes needed?

Does this PR introduce any user-facing change?

No

How was this patch tested?

Amend existing test to catch this bug.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Jul 10, 2024
s"$defaultUsing CLUSTER BY (col1, col2.x)")
sql(s"ALTER TABLE $tbl ALTER COLUMN col1 COMMENT 'this is comment';")
val descriptionDf = sql(s"DESC $tbl")
assert(descriptionDf.schema.map(field => (field.name, field.dataType)) === Seq(
("col_name", StringType),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This updated test would fail without the fix.

@zedtang
Copy link
Contributor Author

zedtang commented Jul 11, 2024

Hi @cloud-fan , @imback82 , @dabao521, this PR is ready for review, thanks!

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in cbe6846 Jul 12, 2024
@zedtang zedtang deleted the fix-apply-cluster-by-changes branch July 12, 2024 18:05
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
### What changes were proposed in this pull request?

apache#47156 introduced a bug in `CatalogV2Util.applyClusterByChanges` that it will remove the existing `ClusterByTransform` first, regardless of whether there is a `ClusterBy` table change. This means any table change will remove the clustering columns from the table.

This PR fixes the bug by removing the `ClusterByTransform` only when there is a `ClusterBy` table change.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

Amend existing test to catch this bug.
### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47288 from zedtang/fix-apply-cluster-by-changes.

Authored-by: Jiaheng Tang <jiaheng.tang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants