Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] test_column_add_after_partition failed on EGX Standalone cluster #10591

Closed
parthosa opened this issue Mar 14, 2024 · 4 comments
Closed
Assignees
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf test Only impacts tests

Comments

@parthosa
Copy link
Collaborator

parthosa commented Mar 14, 2024

Test failed due to the following error:

java.lang.IllegalArgumentException: Cannot grow BufferHolder by size -184 because the size is negative
Full Output
[2024-03-13T23:15:17.622Z] E  py4j.protocol.Py4JJavaError: An error occurred while calling o6966656.collectToPython.
[2024-03-13T23:15:17.622Z] E : org.apache.spark.SparkException: Job aborted due to stage failure: Task 23 in stage 105354.0 failed 1 times, most recent failure: Lost task 23.0 in stage 105354.0 (TID 2832973) (10.136.6.4 executor 2): java.lang.IllegalArgumentException: Cannot grow BufferHolder by size -184 because the size is negative
[2024-03-13T23:15:17.622Z] E at org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:67)
[2024-03-13T23:15:17.622Z] E at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter.initialize(UnsafeArrayWriter.java:61)
[2024-03-13T23:15:17.622Z] E at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_2$(Unknown Source)
[2024-03-13T23:15:17.622Z] E at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
[2024-03-13T23:15:17.622Z] E at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
[2024-03-13T23:15:17.622Z] E at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
[2024-03-13T23:15:17.622Z] E at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:365)
@parthosa parthosa added bug Something isn't working ? - Needs Triage Need team to review and classify test Only impacts tests labels Mar 14, 2024
@jlowe jlowe self-assigned this Mar 14, 2024
@jlowe
Copy link
Member

jlowe commented Mar 14, 2024

I can reproduce this locally via:

TEST_PARALLEL=0 TZ=UTC PYSP_TEST_spark_master="local[24]" SPARK_HOME=/home/jlowe/spark-3.3.3-bin-hadoop3/ integration_tests/run_pyspark_from_build.sh -k "test_column_add_after_partition and parquet"

@jlowe
Copy link
Member

jlowe commented Mar 14, 2024

I've narrowed the issue down to loading this parquet file, dumped while running the test. Trying to load this Parquet file with the RAPIDS Accelerator results in the BufferHolder negative size issue.
1418348638.parquet.gz

Note it does not fail if I just load column new_10 which is the problematic column, but it does if I load new_11 along with it. new_10 ends up with an offset vector that is invalid, where next_offset >= prev_offset is not always true. Since the array size is computed as next_offset - prev_offset, that explains how we end up with a negative array size with this data.

@jlowe
Copy link
Member

jlowe commented Mar 14, 2024

Interestingly, the chunked reader needs to be enabled to trigger the issue. Disabling it allows it to load the file properly.

@jlowe jlowe added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Mar 14, 2024
@jlowe
Copy link
Member

jlowe commented Mar 14, 2024

@nvdbaranec, was able to reproduce the problem on the test file with a C++ program using the chunked reader from libcudf. Filed rapidsai/cudf#15306.

@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Mar 14, 2024
@jlowe jlowe closed this as completed Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf test Only impacts tests
Projects
None yet
Development

No branches or pull requests

3 participants