Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats missing for dataSkippingStatsColumns when escaping column name #2849

Closed
mattfysh opened this issue Sep 6, 2024 · 1 comment · Fixed by #2855
Closed

Stats missing for dataSkippingStatsColumns when escaping column name #2849

mattfysh opened this issue Sep 6, 2024 · 1 comment · Fixed by #2855
Labels
bug Something isn't working

Comments

@mattfysh
Copy link

mattfysh commented Sep 6, 2024

Environment

Delta-rs version: 0.19.2

Binding: Python

Environment:

  • Cloud provider:
  • OS: MacOS
  • Other:

Bug

What happened:

Stats are not captured for some columns in dataSkippingStatsColumns. If you were to write the table without that option, the same columns will appear in min/max values stats.

In the case below, only stats for "x" are captured. It appears that escaping the column name causes the column to be skipped

However, some columns must be escaped otherwise you get an error, e.g.

Stats column z.z not found in schema

What you expected to happen:

The stats captured the columns specified - in this case all of "x", "y" and "z.z"

How to reproduce it:

import polars as pl

df = pl.DataFrame({"x": [1, 2], "y": [5, 6], "z.z": [3, 4]})

df.write_delta(
    "table",
    delta_write_options={
        "configuration": {"delta.dataSkippingStatsColumns": "x,`y`,`z.z`"}
    },
)

More details:

@mattfysh mattfysh added the bug Something isn't working label Sep 6, 2024
@mattfysh
Copy link
Author

mattfysh commented Sep 6, 2024

A somewhat related bug - if you specify a column in delta.dataSkippingStatsColumns that is binary you'll get this error, even though the same column works without DSSC config option specified

_internal.DeltaError: Generic DeltaTable error: Stats column binary_column has unsupported type binary

It seems odd that by using the DSSC config option, a lot of things that work normally suddenly break?

@mattfysh mattfysh changed the title Stats missing for dataSkippingStatsColumns when special characters Stats missing for dataSkippingStatsColumns when escaping column name Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant