Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] orc_write_test failed in databricks runtime #3017

Closed
pxLi opened this issue Jul 24, 2021 · 7 comments
Closed

[BUG] orc_write_test failed in databricks runtime #3017

pxLi opened this issue Jul 24, 2021 · 7 comments
Assignees
Labels
bug Something isn't working P0 Must have for release test Only impacts tests

Comments

@pxLi
Copy link
Collaborator

pxLi commented Jul 24, 2021

Describe the bug
This was hidden by #3016, found when I was skimming logs

[2021-07-23T15:21:18.484Z] FAILED ../../src/main/python/orc_write_test.py::test_write_round_trip[native-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]]
[2021-07-23T15:21:18.484Z] FAILED ../../src/main/python/orc_write_test.py::test_write_round_trip[hive-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]]
[2021-07-23T15:21:18.484Z] FAILED ../../src/main/python/orc_write_test.py::test_write_save_table[native-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]]
[2021-07-23T15:21:18.484Z] FAILED ../../src/main/python/orc_write_test.py::test_write_save_table[hive-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]]
[2021-07-23T15:21:18.484Z] FAILED ../../src/main/python/orc_write_test.py::test_write_sql_save_table[native-TIMESTAMP_MICROS-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]]
[2021-07-23T15:21:18.484Z] FAILED ../../src/main/python/orc_write_test.py::test_write_sql_save_table[native-TIMESTAMP_MILLIS-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]]
[2021-07-23T15:21:18.484Z] FAILED ../../src/main/python/orc_write_test.py::test_write_sql_save_table[hive-TIMESTAMP_MICROS-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]]
[2021-07-23T15:21:18.484Z] FAILED ../../src/main/python/orc_write_test.py::test_write_sql_save_table[hive-TIMESTAMP_MILLIS-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]]

all tests failed as detailed log below,

[2021-07-23T15:21:18.191Z] �[31m�[1m_ test_write_round_trip[native-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]] _�[0m
[2021-07-23T15:21:18.191Z] [gw4] linux -- Python 3.7.10 /databricks/conda/envs/databricks-ml-gpu/bin/python
[2021-07-23T15:21:18.191Z] 
[2021-07-23T15:21:18.191Z] spark_tmp_path = '/tmp/pyspark_tests//363/'
[2021-07-23T15:21:18.191Z] orc_gens = [Byte, Short, Integer, Long, Float, Double, ...], orc_impl = 'native'
[2021-07-23T15:21:18.191Z] 
[2021-07-23T15:21:18.191Z]     �[37m@pytest�[39;49;00m.mark.parametrize(�[33m'�[39;49;00m�[33morc_gens�[39;49;00m�[33m'�[39;49;00m, orc_write_gens_list, ids=idfn)
[2021-07-23T15:21:18.191Z]     �[37m@pytest�[39;49;00m.mark.parametrize(�[33m'�[39;49;00m�[33morc_impl�[39;49;00m�[33m'�[39;49;00m, [�[33m"�[39;49;00m�[33mnative�[39;49;00m�[33m"�[39;49;00m, �[33m"�[39;49;00m�[33mhive�[39;49;00m�[33m"�[39;49;00m])
[2021-07-23T15:21:18.191Z]     �[94mdef�[39;49;00m �[92mtest_write_round_trip�[39;49;00m(spark_tmp_path, orc_gens, orc_impl):
[2021-07-23T15:21:18.191Z]         gen_list = [(�[33m'�[39;49;00m�[33m_c�[39;49;00m�[33m'�[39;49;00m + �[96mstr�[39;49;00m(i), gen) �[94mfor�[39;49;00m i, gen �[95min�[39;49;00m �[96menumerate�[39;49;00m(orc_gens)]
[2021-07-23T15:21:18.191Z]         data_path = spark_tmp_path + �[33m'�[39;49;00m�[33m/ORC_DATA�[39;49;00m�[33m'�[39;49;00m
[2021-07-23T15:21:18.191Z]         assert_gpu_and_cpu_writes_are_equal_collect(
[2021-07-23T15:21:18.191Z]                 �[94mlambda�[39;49;00m spark, path: gen_df(spark, gen_list).coalesce(�[94m1�[39;49;00m).write.orc(path),
[2021-07-23T15:21:18.191Z]                 �[94mlambda�[39;49;00m spark, path: spark.read.orc(path),
[2021-07-23T15:21:18.191Z]                 data_path,
[2021-07-23T15:21:18.191Z] >               conf={�[33m'�[39;49;00m�[33mspark.sql.orc.impl�[39;49;00m�[33m'�[39;49;00m: orc_impl, �[33m'�[39;49;00m�[33mspark.rapids.sql.format.orc.write.enabled�[39;49;00m�[33m'�[39;49;00m: �[94mTrue�[39;49;00m})
[2021-07-23T15:21:18.191Z] 
[2021-07-23T15:21:18.191Z] �[1m�[31m../../src/main/python/orc_write_test.py�[0m:39: 
[2021-07-23T15:21:18.191Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2021-07-23T15:21:18.191Z] �[1m�[31m../../src/main/python/asserts.py�[0m:257: in assert_gpu_and_cpu_writes_are_equal_collect
[2021-07-23T15:21:18.191Z]     _assert_gpu_and_cpu_writes_are_equal(write_func, read_func, base_path, �[33m'�[39;49;00m�[33mCOLLECT�[39;49;00m�[33m'�[39;49;00m, conf=conf)
[2021-07-23T15:21:18.191Z] �[1m�[31m../../src/main/python/asserts.py�[0m:243: in _assert_gpu_and_cpu_writes_are_equal
[2021-07-23T15:21:18.191Z]     from_gpu = with_cpu_session(gpu_bring_back, conf=conf)
[2021-07-23T15:21:18.191Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:86: in with_cpu_session
[2021-07-23T15:21:18.191Z]     �[94mreturn�[39;49;00m with_spark_session(func, conf=copy)
[2021-07-23T15:21:18.191Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:70: in with_spark_session
[2021-07-23T15:21:18.191Z]     ret = func(_spark)
[2021-07-23T15:21:18.191Z] �[1m�[31m../../src/main/python/asserts.py�[0m:190: in <lambda>
[2021-07-23T15:21:18.191Z]     bring_back = �[94mlambda�[39;49;00m spark: limit_func(spark).collect()
[2021-07-23T15:21:18.191Z] �[1m�[31m/databricks/spark/python/pyspark/sql/dataframe.py�[0m:611: in collect
[2021-07-23T15:21:18.191Z]     sock_info = �[96mself�[39;49;00m._jdf.collectToPython()
[2021-07-23T15:21:18.191Z] �[1m�[31m/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py�[0m:1305: in __call__
[2021-07-23T15:21:18.191Z]     answer, �[96mself�[39;49;00m.gateway_client, �[96mself�[39;49;00m.target_id, �[96mself�[39;49;00m.name)
[2021-07-23T15:21:18.191Z] �[1m�[31m/databricks/spark/python/pyspark/sql/utils.py�[0m:127: in deco
[2021-07-23T15:21:18.191Z]     �[94mreturn�[39;49;00m f(*a, **kw)
[2021-07-23T15:21:18.191Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2021-07-23T15:21:18.191Z] 
[2021-07-23T15:21:18.191Z] answer = 'xro190967'
[2021-07-23T15:21:18.191Z] gateway_client = <py4j.java_gateway.GatewayClient object at 0x7f8942d83550>
[2021-07-23T15:21:18.191Z] target_id = 'o190964', name = 'collectToPython'
[2021-07-23T15:21:18.191Z] 
[2021-07-23T15:21:18.191Z]     �[94mdef�[39;49;00m �[92mget_return_value�[39;49;00m(answer, gateway_client, target_id=�[94mNone�[39;49;00m, name=�[94mNone�[39;49;00m):
[2021-07-23T15:21:18.191Z]         �[33m"""Converts an answer received from the Java gateway into a Python object.�[39;49;00m
[2021-07-23T15:21:18.191Z]     �[33m�[39;49;00m
[2021-07-23T15:21:18.191Z]     �[33m    For example, string representation of integers are converted to Python�[39;49;00m
[2021-07-23T15:21:18.191Z]     �[33m    integer, string representation of objects are converted to JavaObject�[39;49;00m
[2021-07-23T15:21:18.191Z]     �[33m    instances, etc.�[39;49;00m
[2021-07-23T15:21:18.191Z]     �[33m�[39;49;00m
[2021-07-23T15:21:18.191Z]     �[33m    :param answer: the string returned by the Java gateway�[39;49;00m
[2021-07-23T15:21:18.191Z]     �[33m    :param gateway_client: the gateway client used to communicate with the Java�[39;49;00m
[2021-07-23T15:21:18.191Z]     �[33m        Gateway. Only necessary if the answer is a reference (e.g., object,�[39;49;00m
[2021-07-23T15:21:18.192Z]     �[33m        list, map)�[39;49;00m
[2021-07-23T15:21:18.192Z]     �[33m    :param target_id: the name of the object from which the answer comes from�[39;49;00m
[2021-07-23T15:21:18.192Z]     �[33m        (e.g., *object1* in `object1.hello()`). Optional.�[39;49;00m
[2021-07-23T15:21:18.192Z]     �[33m    :param name: the name of the member from which the answer comes from�[39;49;00m
[2021-07-23T15:21:18.192Z]     �[33m        (e.g., *hello* in `object1.hello()`). Optional.�[39;49;00m
[2021-07-23T15:21:18.192Z]     �[33m    """�[39;49;00m
[2021-07-23T15:21:18.192Z]         �[94mif�[39;49;00m is_error(answer)[�[94m0�[39;49;00m]:
[2021-07-23T15:21:18.192Z]             �[94mif�[39;49;00m �[96mlen�[39;49;00m(answer) > �[94m1�[39;49;00m:
[2021-07-23T15:21:18.192Z]                 �[96mtype�[39;49;00m = answer[�[94m1�[39;49;00m]
[2021-07-23T15:21:18.192Z]                 value = OUTPUT_CONVERTER[�[96mtype�[39;49;00m](answer[�[94m2�[39;49;00m:], gateway_client)
[2021-07-23T15:21:18.192Z]                 �[94mif�[39;49;00m answer[�[94m1�[39;49;00m] == REFERENCE_TYPE:
[2021-07-23T15:21:18.192Z]                     �[94mraise�[39;49;00m Py4JJavaError(
[2021-07-23T15:21:18.192Z]                         �[33m"�[39;49;00m�[33mAn error occurred while calling �[39;49;00m�[33m{0}�[39;49;00m�[33m{1}�[39;49;00m�[33m{2}�[39;49;00m�[33m.�[39;49;00m�[33m\n�[39;49;00m�[33m"�[39;49;00m.
[2021-07-23T15:21:18.192Z] >                       �[96mformat�[39;49;00m(target_id, �[33m"�[39;49;00m�[33m.�[39;49;00m�[33m"�[39;49;00m, name), value)
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   py4j.protocol.Py4JJavaError: An error occurred while calling o190964.collectToPython.�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 8691.0 failed 4 times, most recent failure: Lost task 0.3 in stage 8691.0 (TID 42267, ip-10-59-167-85.us-west-2.compute.internal, executor driver): com.databricks.sql.io.FileReadException: Error while reading file file:/tmp/pyspark_tests/363/ORC_DATA/GPU/part-00000-tid-5400963992245163165-3a9ff602-9635-4884-9345-619dd5608bf5-42262-1-c000.snappy.orc.�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.logFileNameAndThrow(FileScanRDD.scala:347)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:326)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:417)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:258)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:716)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:733)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processPartition$1(Collector.scala:179)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.SparkContext.$anonfun$runJob$6(SparkContext.scala:2433)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.run(Task.scala:117)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:640)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:643)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at java.lang.Thread.run(Thread.java:748)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   Caused by: java.io.IOException: Error reading file: file:/tmp/pyspark_tests/363/ORC_DATA/GPU/part-00000-tid-5400963992245163165-3a9ff602-9635-4884-9345-619dd5608bf5-42262-1-c000.snappy.orc�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1329)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:196)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:41)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:291)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	... 20 more�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream Stream for column 10 kind SECONDARY position: 1949 length: 1949 range: 0 offset: 49186 limit: 49186 range 0 = 0 to 1949 uncompressed: 1946 to 1946�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:61)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.orc.impl.TreeReaderFactory$TimestampTreeReader.nextVector(TreeReaderFactory.java:1041)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2059)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1322)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	... 24 more�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   �[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   Driver stacktrace:�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2519)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2466)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2460)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2460)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1152)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1152)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at scala.Option.foreach(Option.scala:407)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1152)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2721)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2668)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2656)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:938)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2339)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2434)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:273)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:308)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:82)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:88)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:508)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:480)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SparkPlan.executeCollectResult(SparkPlan.scala:401)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.$anonfun$collectToPython$1(Dataset.scala:3497)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3709)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:116)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:249)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:101)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:845)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:77)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:199)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3707)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3495)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at sun.reflect.GeneratedMethodAccessor130.invoke(Unknown Source)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at java.lang.reflect.Method.invoke(Method.java:498)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at py4j.Gateway.invoke(Gateway.java:295)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at py4j.commands.CallCommand.execute(CallCommand.java:79)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at py4j.GatewayConnection.run(GatewayConnection.java:251)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at java.lang.Thread.run(Thread.java:748)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   Caused by: com.databricks.sql.io.FileReadException: Error while reading file file:/tmp/pyspark_tests/363/ORC_DATA/GPU/part-00000-tid-5400963992245163165-3a9ff602-9635-4884-9345-619dd5608bf5-42262-1-c000.snappy.orc.�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.logFileNameAndThrow(FileScanRDD.scala:347)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:326)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:417)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:258)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:716)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:733)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processPartition$1(Collector.scala:179)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.SparkContext.$anonfun$runJob$6(SparkContext.scala:2433)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.run(Task.scala:117)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:640)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:643)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	... 1 more�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   Caused by: java.io.IOException: Error reading file: file:/tmp/pyspark_tests/363/ORC_DATA/GPU/part-00000-tid-5400963992245163165-3a9ff602-9635-4884-9345-619dd5608bf5-42262-1-c000.snappy.orc�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1329)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:196)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:41)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:291)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	... 20 more�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream Stream for column 10 kind SECONDARY position: 1949 length: 1949 range: 0 offset: 49186 limit: 49186 range 0 = 0 to 1949 uncompressed: 1946 to 1946�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:61)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)�[0m
[2021-07-23T15:21:18.195Z] �[1m�[31mE                   	at org.apache.orc.impl.TreeReaderFactory$TimestampTreeReader.nextVector(TreeReaderFactory.java:1041)�[0m
[2021-07-23T15:21:18.195Z] �[1m�[31mE                   	at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2059)�[0m
[2021-07-23T15:21:18.195Z] �[1m�[31mE                   	at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1322)�[0m
[2021-07-23T15:21:18.195Z] �[1m�[31mE                   	... 24 more�[0m
[2021-07-23T15:21:18.195Z] 

Steps/Code to reproduce bug
run the orc_write_test integration test

Expected behavior
pass the tests

Environment details (please complete the following information)
databricks runtime 7.3 and 8.2

Additional context
Add any other context about the problem here.

@pxLi pxLi added bug Something isn't working test Only impacts tests labels Jul 24, 2021
@tgravescs tgravescs added the P0 Must have for release label Jul 26, 2021
@tgravescs
Copy link
Collaborator

I ran manually on databricks 8.2 using 21.08 built from source, just the orc_write_test and it passed for me.

looks like it started failing on july 22nd, although there is a gap in runs

@tgravescs
Copy link
Collaborator

ok, the above tests I ran were on azure on V100s and they all passed, I just tried on AWS on T4s and the test fails there.

@tgravescs
Copy link
Collaborator

seems to happen with timestamps. I have an orc file that I can read and then write and causes the corruption. Talking to CUDF team and building CUDF in parallel

@tgravescs
Copy link
Collaborator

Also note, if I use the cudf jar for july 21st the problem goes away

@tgravescs
Copy link
Collaborator

proposed fix: rapidsai/cudf#8861
Manual test with the file I had shows it fixes it, running tests on databricks next.

@tgravescs
Copy link
Collaborator

using the cudf fix the tests now all pass

@tgravescs
Copy link
Collaborator

fixed by cudf issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release test Only impacts tests
Projects
None yet
Development

No branches or pull requests

2 participants