-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error decoding field 'stats' when creating checkpoint #2743
Comments
I fixed the error by changing the type of the stats field, but it feels like a workaround static ref ADD_FIELDS: Vec<ArrowField> = arrow_defs![
path:Utf8,
size:Int64,
modificationTime:Int64,
dataChange:Boolean,
stats:Utf8, static ref ADD_FIELDS: Vec<ArrowField> = arrow_defs![
path:Utf8,
size:Int64,
modificationTime:Int64,
dataChange:Boolean,
stats:LargeUtf8, |
After the change, spark stopped reading data |
Utf8 is like to 2gb of data @_@ can you share your table schema and spark error? i can kind of reproduce it . (plz don't do it if you love your laptop)
EDIT1 you can also recreate it using this it will stop when combined log is more than 2 gb
|
Hi, sorry for the delay Here is the schema: schemaroot
|-- field: timestamp (nullable = true)
|-- field: struct (nullable = true)
| |-- field: long (nullable = true)
| |-- field: long (nullable = true)
| |-- field: long (nullable = true)
|-- field: struct (nullable = true)
| |-- field: long (nullable = true)
| |-- field: long (nullable = true)
| |-- field: string (nullable = true)
| |-- field: long (nullable = true)
| |-- field: long (nullable = true)
|-- field: struct (nullable = true)
| |-- field: string (nullable = true)
| |-- field: decimal(30,15) (nullable = true)
| |-- field: boolean (nullable = true)
| |-- field: integer (nullable = true)
| |-- field: struct (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: integer (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: boolean (nullable = true)
| | |-- field: boolean (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: boolean (nullable = true)
| | | |-- field: boolean (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: array (nullable = true)
| | | | |-- field: string (containsNull = true)
| | | |-- field: array (nullable = true)
| | | | |-- field: integer (containsNull = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: struct (nullable = true)
| | | | |-- field: string (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: array (nullable = true)
| | | | |-- field: integer (containsNull = true)
| | | |-- field: struct (nullable = true)
| | | | |-- field: string (nullable = true)
| |-- field: struct (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: struct (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: array (nullable = true)
| | | | | |-- field: string (containsNull = true)
| | | | |-- field: integer (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: array (nullable = true)
| | | | |-- field: string (containsNull = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: boolean (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: boolean (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: boolean (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: boolean (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: boolean (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: boolean (nullable = true)
| | | |-- field: struct (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: double (nullable = true)
| | | | |-- field: double (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | |-- field: struct (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: double (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: array (nullable = true)
| | | | | |-- field: string (containsNull = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: double (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: double (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: long (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: boolean (nullable = true)
| | | |-- field: boolean (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: array (nullable = true)
| | | | |-- field: string (containsNull = true)
| | | |-- field: array (nullable = true)
| | | | |-- field: string (containsNull = true)
| | | |-- field: array (nullable = true)
| | | | |-- field: string (containsNull = true)
| | | |-- field: array (nullable = true)
| | | | |-- field: integer (containsNull = true)
| | |-- field: struct (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: struct (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: double (nullable = true)
| | | | |-- field: double (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: integer (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: struct (nullable = true)
| | | | |-- field: long (nullable = true)
| |-- field: struct (nullable = true)
| | |-- field: integer (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: boolean (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: boolean (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: integer (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: array (nullable = true)
| | | | |-- field: string (containsNull = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: integer (nullable = true)
| | |-- field: array (nullable = true)
| | | |-- field: string (containsNull = true)
| | |-- field: string (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: decimal(30,15) (nullable = true)
| | |-- field: timestamp (nullable = true)
| | |-- field: long (nullable = true)
| | |-- field: long (nullable = true)
| | |-- field: boolean (nullable = true)
| | |-- field: double (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: integer (nullable = true)
| | |-- field: integer (nullable = true)
| | |-- field: integer (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: integer (nullable = true)
| | |-- field: double (nullable = true)
| | |-- field: double (nullable = true)
| | |-- field: long (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: array (nullable = true)
| | | |-- field: string (containsNull = true)
| | |-- field: boolean (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: map (nullable = true)
| | | | |-- field: string
| | | | |-- field: string (valueContainsNull = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: string (nullable = true)
|-- field: struct (nullable = true)
| |-- field: string (nullable = true)
| |-- field: string (nullable = true)
| |-- field: string (nullable = true)
| |-- field: string (nullable = true)
| |-- field: decimal(30,15) (nullable = true)
| |-- field: struct (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: array (nullable = true)
| | | |-- field: string (containsNull = true)
| | |-- field: string (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: array (nullable = true)
| | | |-- field: string (containsNull = true)
| | |-- field: integer (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: array (nullable = true)
| | | |-- field: integer (containsNull = true)
| | |-- field: boolean (nullable = true)
| | |-- field: integer (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: string (nullable = true)
| |-- field: struct (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: decimal(30,15) (nullable = true)
| | |-- field: decimal(30,15) (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | |-- field: integer (nullable = true)
| | |-- field: boolean (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: string (nullable = true)
| | | |-- field: integer (nullable = true)
| | | |-- field: integer (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: array (nullable = true)
| | | |-- field: string (containsNull = true)
| | |-- field: string (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: decimal(30,15) (nullable = true)
| | | |-- field: integer (nullable = true)
| | |-- field: decimal(30,15) (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: double (nullable = true)
| | | |-- field: double (nullable = true)
| | | |-- field: double (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: double (nullable = true)
| | | |-- field: double (nullable = true)
| | | |-- field: double (nullable = true)
| | |-- field: integer (nullable = true)
| | |-- field: long (nullable = true)
| | |-- field: decimal(30,15) (nullable = true)
| | |-- field: string (nullable = true)
| | |-- field: decimal(30,15) (nullable = true)
| | |-- field: decimal(30,15) (nullable = true)
| | |-- field: double (nullable = true)
| | |-- field: double (nullable = true)
| | |-- field: double (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: double (nullable = true)
| | | |-- field: double (nullable = true)
| | | |-- field: double (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: long (nullable = true)
| | | |-- field: long (nullable = true)
| | | |-- field: long (nullable = true)
| | | |-- field: long (nullable = true)
| | | |-- field: long (nullable = true)
| | | |-- field: long (nullable = true)
| | | |-- field: long (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: long (nullable = true)
| | | |-- field: long (nullable = true)
| | | |-- field: long (nullable = true)
| | | |-- field: long (nullable = true)
| | | |-- field: long (nullable = true)
| | | |-- field: long (nullable = true)
| | | |-- field: long (nullable = true)
| | |-- field: struct (nullable = true)
| | | |-- field: array (nullable = true)
| | | | |-- field: string (containsNull = true)
| | | |-- field: array (nullable = true)
| | | | |-- field: string (containsNull = true)
|-- field: integer (nullable = true)
|-- field: integer (nullable = true)
|-- field: integer (nullable = true)
|-- field: integer (nullable = true)
|-- field: struct (nullable = true)
| |-- field: struct (nullable = true)
| | |-- field: double (nullable = true)
| | |-- field: integer (nullable = true)
| | |-- field: integer (nullable = true)
| |-- field: array (nullable = true)
| | |-- field: struct (containsNull = true)
| | | |-- field: string (nullable = true)
| |-- field: array (nullable = true)
| | |-- field: string (containsNull = true)
| |-- field: array (nullable = true)
| | |-- field: struct (containsNull = true)
| | | |-- field: string (nullable = true)
| | | |-- field: double (nullable = true)
| | | |-- field: struct (nullable = true)
| | | | |-- field: string (nullable = true)
| | | | |-- field: double (nullable = true)
| | | | |-- field: struct (nullable = true)
| | | | | |-- field: string (nullable = true)
| | | | | |-- field: double (nullable = true)
| | | | |-- field: array (nullable = true)
| | | | | |-- field: struct (containsNull = true)
| | | | | | |-- field: string (nullable = true)
| | | | | | |-- field: double (nullable = true)
| | | |-- field: string (nullable = true) Spark does not return any errors, it simply ignores the checkpoints that were recorded with No huge json logs, 30 KB on average We currently work around this by using a multi-part checkpoint that is created when running the optimization on the databricks,but this is very slow |
Environment
Delta-rs version: 0.18.1
Binding: rust
Bug
What happened:
While running kafka-delta-ingest the service crashes with an error when trying to create a checkpoint:
DeltaTable { source: Arrow { source: JsonError("whilst decoding field 'add': while decoding field 'stats': offset overflow decoding Utf8") } }
The error pops up here: https://github.com/apache/arrow-rs/blob/49840ec0f110da5e9a21ce97affd32313d0b720f/arrow-json/src/reader/string_array.rs#L81
What you expected to happen:
The checkpoint will be created successfully
How to reproduce it:
Not sure how to reproduce this, presumably it would require a table with a large delta log and data volume
More details:
Not sure what additional information to provide:
Partitioning by year, month, day, and hour;
Current transaction number: 00000000000002020757;
Approximately 4TB are written to the table per day;
Each parquet file is on average 800MB
The text was updated successfully, but these errors were encountered: