[BUG] `test_column_add_after_partition` failed on EGX Standalone cluster #10591

parthosa · 2024-03-14T00:48:27Z

Test failed due to the following error:

java.lang.IllegalArgumentException: Cannot grow BufferHolder by size -184 because the size is negative

Full Output

[2024-03-13T23:15:17.622Z] E  py4j.protocol.Py4JJavaError: An error occurred while calling o6966656.collectToPython.
[2024-03-13T23:15:17.622Z] E : org.apache.spark.SparkException: Job aborted due to stage failure: Task 23 in stage 105354.0 failed 1 times, most recent failure: Lost task 23.0 in stage 105354.0 (TID 2832973) (10.136.6.4 executor 2): java.lang.IllegalArgumentException: Cannot grow BufferHolder by size -184 because the size is negative
[2024-03-13T23:15:17.622Z] E at org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:67)
[2024-03-13T23:15:17.622Z] E at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter.initialize(UnsafeArrayWriter.java:61)
[2024-03-13T23:15:17.622Z] E at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_2$(Unknown Source)
[2024-03-13T23:15:17.622Z] E at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
[2024-03-13T23:15:17.622Z] E at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
[2024-03-13T23:15:17.622Z] E at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
[2024-03-13T23:15:17.622Z] E at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:365)

The text was updated successfully, but these errors were encountered:

jlowe · 2024-03-14T16:05:50Z

I can reproduce this locally via:

TEST_PARALLEL=0 TZ=UTC PYSP_TEST_spark_master="local[24]" SPARK_HOME=/home/jlowe/spark-3.3.3-bin-hadoop3/ integration_tests/run_pyspark_from_build.sh -k "test_column_add_after_partition and parquet"

jlowe · 2024-03-14T19:43:11Z

I've narrowed the issue down to loading this parquet file, dumped while running the test. Trying to load this Parquet file with the RAPIDS Accelerator results in the BufferHolder negative size issue.
1418348638.parquet.gz

Note it does not fail if I just load column new_10 which is the problematic column, but it does if I load new_11 along with it. new_10 ends up with an offset vector that is invalid, where next_offset >= prev_offset is not always true. Since the array size is computed as next_offset - prev_offset, that explains how we end up with a negative array size with this data.

jlowe · 2024-03-14T19:52:06Z

Interestingly, the chunked reader needs to be enabled to trigger the issue. Disabling it allows it to load the file properly.

jlowe · 2024-03-14T20:11:47Z

@nvdbaranec, was able to reproduce the problem on the test file with a C++ program using the chunked reader from libcudf. Filed rapidsai/cudf#15306.

parthosa added bug Something isn't working ? - Needs Triage Need team to review and classify test Only impacts tests labels Mar 14, 2024

jlowe self-assigned this Mar 14, 2024

jlowe added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Mar 14, 2024

sameerz removed the ? - Needs Triage Need team to review and classify label Mar 14, 2024

jlowe closed this as completed Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `test_column_add_after_partition` failed on EGX Standalone cluster #10591

[BUG] `test_column_add_after_partition` failed on EGX Standalone cluster #10591

parthosa commented Mar 14, 2024 •

edited by sameerz

Loading

jlowe commented Mar 14, 2024

jlowe commented Mar 14, 2024 •

edited

Loading

jlowe commented Mar 14, 2024

jlowe commented Mar 14, 2024

[BUG] test_column_add_after_partition failed on EGX Standalone cluster #10591

[BUG] test_column_add_after_partition failed on EGX Standalone cluster #10591

Comments

parthosa commented Mar 14, 2024 • edited by sameerz Loading

jlowe commented Mar 14, 2024

jlowe commented Mar 14, 2024 • edited Loading

jlowe commented Mar 14, 2024

jlowe commented Mar 14, 2024

[BUG] `test_column_add_after_partition` failed on EGX Standalone cluster #10591

[BUG] `test_column_add_after_partition` failed on EGX Standalone cluster #10591

parthosa commented Mar 14, 2024 •

edited by sameerz

Loading

jlowe commented Mar 14, 2024 •

edited

Loading