[BUG] orc_write_test failed in databricks runtime #3017

pxLi · 2021-07-24T08:19:35Z

Describe the bug
This was hidden by #3016, found when I was skimming logs

[2021-07-23T15:21:18.484Z] FAILED ../../src/main/python/orc_write_test.py::test_write_round_trip[native-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]]
[2021-07-23T15:21:18.484Z] FAILED ../../src/main/python/orc_write_test.py::test_write_round_trip[hive-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]]
[2021-07-23T15:21:18.484Z] FAILED ../../src/main/python/orc_write_test.py::test_write_save_table[native-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]]
[2021-07-23T15:21:18.484Z] FAILED ../../src/main/python/orc_write_test.py::test_write_save_table[hive-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]]
[2021-07-23T15:21:18.484Z] FAILED ../../src/main/python/orc_write_test.py::test_write_sql_save_table[native-TIMESTAMP_MICROS-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]]
[2021-07-23T15:21:18.484Z] FAILED ../../src/main/python/orc_write_test.py::test_write_sql_save_table[native-TIMESTAMP_MILLIS-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]]
[2021-07-23T15:21:18.484Z] FAILED ../../src/main/python/orc_write_test.py::test_write_sql_save_table[hive-TIMESTAMP_MICROS-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]]
[2021-07-23T15:21:18.484Z] FAILED ../../src/main/python/orc_write_test.py::test_write_sql_save_table[hive-TIMESTAMP_MILLIS-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]]

all tests failed as detailed log below,

[2021-07-23T15:21:18.191Z] �[31m�[1m_ test_write_round_trip[native-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]] _�[0m
[2021-07-23T15:21:18.191Z] [gw4] linux -- Python 3.7.10 /databricks/conda/envs/databricks-ml-gpu/bin/python
[2021-07-23T15:21:18.191Z] 
[2021-07-23T15:21:18.191Z] spark_tmp_path = '/tmp/pyspark_tests//363/'
[2021-07-23T15:21:18.191Z] orc_gens = [Byte, Short, Integer, Long, Float, Double, ...], orc_impl = 'native'
[2021-07-23T15:21:18.191Z] 
[2021-07-23T15:21:18.191Z]     �[37m@pytest�[39;49;00m.mark.parametrize(�[33m'�[39;49;00m�[33morc_gens�[39;49;00m�[33m'�[39;49;00m, orc_write_gens_list, ids=idfn)
[2021-07-23T15:21:18.191Z]     �[37m@pytest�[39;49;00m.mark.parametrize(�[33m'�[39;49;00m�[33morc_impl�[39;49;00m�[33m'�[39;49;00m, [�[33m"�[39;49;00m�[33mnative�[39;49;00m�[33m"�[39;49;00m, �[33m"�[39;49;00m�[33mhive�[39;49;00m�[33m"�[39;49;00m])
[2021-07-23T15:21:18.191Z]     �[94mdef�[39;49;00m �[92mtest_write_round_trip�[39;49;00m(spark_tmp_path, orc_gens, orc_impl):
[2021-07-23T15:21:18.191Z]         gen_list = [(�[33m'�[39;49;00m�[33m_c�[39;49;00m�[33m'�[39;49;00m + �[96mstr�[39;49;00m(i), gen) �[94mfor�[39;49;00m i, gen �[95min�[39;49;00m �[96menumerate�[39;49;00m(orc_gens)]
[2021-07-23T15:21:18.191Z]         data_path = spark_tmp_path + �[33m'�[39;49;00m�[33m/ORC_DATA�[39;49;00m�[33m'�[39;49;00m
[2021-07-23T15:21:18.191Z]         assert_gpu_and_cpu_writes_are_equal_collect(
[2021-07-23T15:21:18.191Z]                 �[94mlambda�[39;49;00m spark, path: gen_df(spark, gen_list).coalesce(�[94m1�[39;49;00m).write.orc(path),
[2021-07-23T15:21:18.191Z]                 �[94mlambda�[39;49;00m spark, path: spark.read.orc(path),
[2021-07-23T15:21:18.191Z]                 data_path,
[2021-07-23T15:21:18.191Z] >               conf={�[33m'�[39;49;00m�[33mspark.sql.orc.impl�[39;49;00m�[33m'�[39;49;00m: orc_impl, �[33m'�[39;49;00m�[33mspark.rapids.sql.format.orc.write.enabled�[39;49;00m�[33m'�[39;49;00m: �[94mTrue�[39;49;00m})
[2021-07-23T15:21:18.191Z] 
[2021-07-23T15:21:18.191Z] �[1m�[31m../../src/main/python/orc_write_test.py�[0m:39: 
[2021-07-23T15:21:18.191Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2021-07-23T15:21:18.191Z] �[1m�[31m../../src/main/python/asserts.py�[0m:257: in assert_gpu_and_cpu_writes_are_equal_collect
[2021-07-23T15:21:18.191Z]     _assert_gpu_and_cpu_writes_are_equal(write_func, read_func, base_path, �[33m'�[39;49;00m�[33mCOLLECT�[39;49;00m�[33m'�[39;49;00m, conf=conf)
[2021-07-23T15:21:18.191Z] �[1m�[31m../../src/main/python/asserts.py�[0m:243: in _assert_gpu_and_cpu_writes_are_equal
[2021-07-23T15:21:18.191Z]     from_gpu = with_cpu_session(gpu_bring_back, conf=conf)
[2021-07-23T15:21:18.191Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:86: in with_cpu_session
[2021-07-23T15:21:18.191Z]     �[94mreturn�[39;49;00m with_spark_session(func, conf=copy)
[2021-07-23T15:21:18.191Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:70: in with_spark_session
[2021-07-23T15:21:18.191Z]     ret = func(_spark)
[2021-07-23T15:21:18.191Z] �[1m�[31m../../src/main/python/asserts.py�[0m:190: in <lambda>
[2021-07-23T15:21:18.191Z]     bring_back = �[94mlambda�[39;49;00m spark: limit_func(spark).collect()
[2021-07-23T15:21:18.191Z] �[1m�[31m/databricks/spark/python/pyspark/sql/dataframe.py�[0m:611: in collect
[2021-07-23T15:21:18.191Z]     sock_info = �[96mself�[39;49;00m._jdf.collectToPython()
[2021-07-23T15:21:18.191Z] �[1m�[31m/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py�[0m:1305: in __call__
[2021-07-23T15:21:18.191Z]     answer, �[96mself�[39;49;00m.gateway_client, �[96mself�[39;49;00m.target_id, �[96mself�[39;49;00m.name)
[2021-07-23T15:21:18.191Z] �[1m�[31m/databricks/spark/python/pyspark/sql/utils.py�[0m:127: in deco
[2021-07-23T15:21:18.191Z]     �[94mreturn�[39;49;00m f(*a, **kw)
[2021-07-23T15:21:18.191Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2021-07-23T15:21:18.191Z] 
[2021-07-23T15:21:18.191Z] answer = 'xro190967'
[2021-07-23T15:21:18.191Z] gateway_client = <py4j.java_gateway.GatewayClient object at 0x7f8942d83550>
[2021-07-23T15:21:18.191Z] target_id = 'o190964', name = 'collectToPython'
[2021-07-23T15:21:18.191Z] 
[2021-07-23T15:21:18.191Z]     �[94mdef�[39;49;00m �[92mget_return_value�[39;49;00m(answer, gateway_client, target_id=�[94mNone�[39;49;00m, name=�[94mNone�[39;49;00m):
[2021-07-23T15:21:18.191Z]         �[33m"""Converts an answer received from the Java gateway into a Python object.�[39;49;00m
[2021-07-23T15:21:18.191Z]     �[33m�[39;49;00m
[2021-07-23T15:21:18.191Z]     �[33m    For example, string representation of integers are converted to Python�[39;49;00m
[2021-07-23T15:21:18.191Z]     �[33m    integer, string representation of objects are converted to JavaObject�[39;49;00m
[2021-07-23T15:21:18.191Z]     �[33m    instances, etc.�[39;49;00m
[2021-07-23T15:21:18.191Z]     �[33m�[39;49;00m
[2021-07-23T15:21:18.191Z]     �[33m    :param answer: the string returned by the Java gateway�[39;49;00m
[2021-07-23T15:21:18.191Z]     �[33m    :param gateway_client: the gateway client used to communicate with the Java�[39;49;00m
[2021-07-23T15:21:18.191Z]     �[33m        Gateway. Only necessary if the answer is a reference (e.g., object,�[39;49;00m
[2021-07-23T15:21:18.192Z]     �[33m        list, map)�[39;49;00m
[2021-07-23T15:21:18.192Z]     �[33m    :param target_id: the name of the object from which the answer comes from�[39;49;00m
[2021-07-23T15:21:18.192Z]     �[33m        (e.g., *object1* in `object1.hello()`). Optional.�[39;49;00m
[2021-07-23T15:21:18.192Z]     �[33m    :param name: the name of the member from which the answer comes from�[39;49;00m
[2021-07-23T15:21:18.192Z]     �[33m        (e.g., *hello* in `object1.hello()`). Optional.�[39;49;00m
[2021-07-23T15:21:18.192Z]     �[33m    """�[39;49;00m
[2021-07-23T15:21:18.192Z]         �[94mif�[39;49;00m is_error(answer)[�[94m0�[39;49;00m]:
[2021-07-23T15:21:18.192Z]             �[94mif�[39;49;00m �[96mlen�[39;49;00m(answer) > �[94m1�[39;49;00m:
[2021-07-23T15:21:18.192Z]                 �[96mtype�[39;49;00m = answer[�[94m1�[39;49;00m]
[2021-07-23T15:21:18.192Z]                 value = OUTPUT_CONVERTER[�[96mtype�[39;49;00m](answer[�[94m2�[39;49;00m:], gateway_client)
[2021-07-23T15:21:18.192Z]                 �[94mif�[39;49;00m answer[�[94m1�[39;49;00m] == REFERENCE_TYPE:
[2021-07-23T15:21:18.192Z]                     �[94mraise�[39;49;00m Py4JJavaError(
[2021-07-23T15:21:18.192Z]                         �[33m"�[39;49;00m�[33mAn error occurred while calling �[39;49;00m�[33m{0}�[39;49;00m�[33m{1}�[39;49;00m�[33m{2}�[39;49;00m�[33m.�[39;49;00m�[33m\n�[39;49;00m�[33m"�[39;49;00m.
[2021-07-23T15:21:18.192Z] >                       �[96mformat�[39;49;00m(target_id, �[33m"�[39;49;00m�[33m.�[39;49;00m�[33m"�[39;49;00m, name), value)
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   py4j.protocol.Py4JJavaError: An error occurred while calling o190964.collectToPython.�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 8691.0 failed 4 times, most recent failure: Lost task 0.3 in stage 8691.0 (TID 42267, ip-10-59-167-85.us-west-2.compute.internal, executor driver): com.databricks.sql.io.FileReadException: Error while reading file file:/tmp/pyspark_tests/363/ORC_DATA/GPU/part-00000-tid-5400963992245163165-3a9ff602-9635-4884-9345-619dd5608bf5-42262-1-c000.snappy.orc.�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.logFileNameAndThrow(FileScanRDD.scala:347)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:326)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:417)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:258)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:716)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:733)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processPartition$1(Collector.scala:179)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.SparkContext.$anonfun$runJob$6(SparkContext.scala:2433)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.run(Task.scala:117)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:640)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:643)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at java.lang.Thread.run(Thread.java:748)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   Caused by: java.io.IOException: Error reading file: file:/tmp/pyspark_tests/363/ORC_DATA/GPU/part-00000-tid-5400963992245163165-3a9ff602-9635-4884-9345-619dd5608bf5-42262-1-c000.snappy.orc�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1329)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:196)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:41)�[0m
[2021-07-23T15:21:18.192Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:291)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	... 20 more�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream Stream for column 10 kind SECONDARY position: 1949 length: 1949 range: 0 offset: 49186 limit: 49186 range 0 = 0 to 1949 uncompressed: 1946 to 1946�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:61)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.orc.impl.TreeReaderFactory$TimestampTreeReader.nextVector(TreeReaderFactory.java:1041)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2059)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1322)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	... 24 more�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   �[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   Driver stacktrace:�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2519)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2466)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2460)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2460)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1152)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1152)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at scala.Option.foreach(Option.scala:407)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1152)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2721)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2668)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2656)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:938)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2339)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2434)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:273)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:308)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:82)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:88)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:508)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:480)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SparkPlan.executeCollectResult(SparkPlan.scala:401)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.$anonfun$collectToPython$1(Dataset.scala:3497)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3709)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:116)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:249)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:101)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:845)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:77)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:199)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3707)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3495)�[0m
[2021-07-23T15:21:18.193Z] �[1m�[31mE                   	at sun.reflect.GeneratedMethodAccessor130.invoke(Unknown Source)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at java.lang.reflect.Method.invoke(Method.java:498)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at py4j.Gateway.invoke(Gateway.java:295)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at py4j.commands.CallCommand.execute(CallCommand.java:79)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at py4j.GatewayConnection.run(GatewayConnection.java:251)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at java.lang.Thread.run(Thread.java:748)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   Caused by: com.databricks.sql.io.FileReadException: Error while reading file file:/tmp/pyspark_tests/363/ORC_DATA/GPU/part-00000-tid-5400963992245163165-3a9ff602-9635-4884-9345-619dd5608bf5-42262-1-c000.snappy.orc.�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.logFileNameAndThrow(FileScanRDD.scala:347)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:326)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:417)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:258)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:716)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:733)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processPartition$1(Collector.scala:179)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.SparkContext.$anonfun$runJob$6(SparkContext.scala:2433)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.run(Task.scala:117)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:640)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:643)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	... 1 more�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   Caused by: java.io.IOException: Error reading file: file:/tmp/pyspark_tests/363/ORC_DATA/GPU/part-00000-tid-5400963992245163165-3a9ff602-9635-4884-9345-619dd5608bf5-42262-1-c000.snappy.orc�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1329)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:196)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:41)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:291)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	... 20 more�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream Stream for column 10 kind SECONDARY position: 1949 length: 1949 range: 0 offset: 49186 limit: 49186 range 0 = 0 to 1949 uncompressed: 1946 to 1946�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:61)�[0m
[2021-07-23T15:21:18.194Z] �[1m�[31mE                   	at org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)�[0m
[2021-07-23T15:21:18.195Z] �[1m�[31mE                   	at org.apache.orc.impl.TreeReaderFactory$TimestampTreeReader.nextVector(TreeReaderFactory.java:1041)�[0m
[2021-07-23T15:21:18.195Z] �[1m�[31mE                   	at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2059)�[0m
[2021-07-23T15:21:18.195Z] �[1m�[31mE                   	at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1322)�[0m
[2021-07-23T15:21:18.195Z] �[1m�[31mE                   	... 24 more�[0m
[2021-07-23T15:21:18.195Z]

Steps/Code to reproduce bug
run the orc_write_test integration test

Expected behavior
pass the tests

Environment details (please complete the following information)
databricks runtime 7.3 and 8.2

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

tgravescs · 2021-07-26T17:40:49Z

I ran manually on databricks 8.2 using 21.08 built from source, just the orc_write_test and it passed for me.

looks like it started failing on july 22nd, although there is a gap in runs

tgravescs · 2021-07-26T19:02:30Z

ok, the above tests I ran were on azure on V100s and they all passed, I just tried on AWS on T4s and the test fails there.

tgravescs · 2021-07-26T23:04:28Z

seems to happen with timestamps. I have an orc file that I can read and then write and causes the corruption. Talking to CUDF team and building CUDF in parallel

tgravescs · 2021-07-26T23:26:24Z

Also note, if I use the cudf jar for july 21st the problem goes away

tgravescs · 2021-07-27T13:16:07Z

proposed fix: rapidsai/cudf#8861
Manual test with the file I had shows it fixes it, running tests on databricks next.

tgravescs · 2021-07-27T13:54:26Z

using the cudf fix the tests now all pass

tgravescs · 2021-07-30T16:36:43Z

fixed by cudf issue.

pxLi added bug Something isn't working test Only impacts tests labels Jul 24, 2021

pxLi mentioned this issue Jul 24, 2021

[BUG] databricks script failed to return correct exit code #3016

Closed

tgravescs added the P0 Must have for release label Jul 26, 2021

Salonijain27 assigned tgravescs Jul 26, 2021

tgravescs mentioned this issue Jul 27, 2021

Fix the RLE stream size for timestamp columns rapidsai/cudf#8861

Merged

tgravescs closed this as completed Jul 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] orc_write_test failed in databricks runtime #3017

[BUG] orc_write_test failed in databricks runtime #3017

pxLi commented Jul 24, 2021 •

edited

Loading

tgravescs commented Jul 26, 2021

tgravescs commented Jul 26, 2021

tgravescs commented Jul 26, 2021

tgravescs commented Jul 26, 2021

tgravescs commented Jul 27, 2021

tgravescs commented Jul 27, 2021

tgravescs commented Jul 30, 2021

[BUG] orc_write_test failed in databricks runtime #3017

[BUG] orc_write_test failed in databricks runtime #3017

Comments

pxLi commented Jul 24, 2021 • edited Loading

tgravescs commented Jul 26, 2021

tgravescs commented Jul 26, 2021

tgravescs commented Jul 26, 2021

tgravescs commented Jul 26, 2021

tgravescs commented Jul 27, 2021

tgravescs commented Jul 27, 2021

tgravescs commented Jul 30, 2021

pxLi commented Jul 24, 2021 •

edited

Loading