[BUG] test_write_empty_parquet_round_trip failed #4749

jlowe · 2022-02-10T22:59:27Z

test_empty_write_parquet_round_trip failed with what appears to be an accidental collision with another test:

2022-02-10T21:39:50.733Z] _ test_write_empty_parquet_round_trip[TIMESTAMP_MICROS--reader_confs0-[Array(Long)]] _
[2022-02-10T21:39:50.733Z] 
[2022-02-10T21:39:50.733Z] spark_tmp_path = '/tmp/pyspark_tests//823772/', parquet_gens = [Array(Long)]
[2022-02-10T21:39:50.733Z] v1_enabled_list = '', ts_type = 'TIMESTAMP_MICROS'
[2022-02-10T21:39:50.733Z] reader_confs = {'spark.rapids.sql.format.parquet.reader.type': 'PERFILE'}
[2022-02-10T21:39:50.733Z] 
[2022-02-10T21:39:50.733Z]     @pytest.mark.parametrize('parquet_gens', parquet_write_gens_list, ids=idfn)
[2022-02-10T21:39:50.733Z]     @pytest.mark.parametrize('reader_confs', reader_opt_confs)
[2022-02-10T21:39:50.733Z]     @pytest.mark.parametrize('v1_enabled_list', ["", "parquet"])
[2022-02-10T21:39:50.733Z]     @pytest.mark.parametrize('ts_type', parquet_ts_write_options)
[2022-02-10T21:39:50.733Z]     def test_write_empty_parquet_round_trip(spark_tmp_path,
[2022-02-10T21:39:50.733Z]                                             parquet_gens,
[2022-02-10T21:39:50.733Z]                                             v1_enabled_list,
[2022-02-10T21:39:50.733Z]                                             ts_type,
[2022-02-10T21:39:50.733Z]                                             reader_confs):
[2022-02-10T21:39:50.733Z]         def create_empty_df(spark, path):
[2022-02-10T21:39:50.733Z]             gen_list = [('_c' + str(i), gen) for i, gen in enumerate(parquet_gens)]
[2022-02-10T21:39:50.733Z]             return gen_df(spark, gen_list, length=0).write.parquet(path)
[2022-02-10T21:39:50.733Z]         data_path = spark_tmp_path + '/PARQUET_DATA'
[2022-02-10T21:39:50.733Z]         all_confs = copy_and_update(reader_confs, writer_confs, {
[2022-02-10T21:39:50.733Z]             'spark.sql.sources.useV1SourceList': v1_enabled_list,
[2022-02-10T21:39:50.733Z]             'spark.sql.parquet.outputTimestampType': ts_type})
[2022-02-10T21:39:50.734Z] >       assert_gpu_and_cpu_writes_are_equal_collect(
[2022-02-10T21:39:50.734Z]             create_empty_df,
[2022-02-10T21:39:50.734Z]             lambda spark, path: spark.read.parquet(path),
[2022-02-10T21:39:50.734Z]             data_path,
[2022-02-10T21:39:50.734Z]             conf=all_confs)
[2022-02-10T21:39:50.734Z] 
[2022-02-10T21:39:50.734Z] ../../src/main/python/parquet_write_test.py:411: 
[2022-02-10T21:39:50.734Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-02-10T21:39:50.734Z] ../../src/main/python/asserts.py:265: in assert_gpu_and_cpu_writes_are_equal_collect
[2022-02-10T21:39:50.734Z]     _assert_gpu_and_cpu_writes_are_equal(write_func, read_func, base_path, 'COLLECT', conf=conf)
[2022-02-10T21:39:50.734Z] ../../src/main/python/asserts.py:235: in _assert_gpu_and_cpu_writes_are_equal
[2022-02-10T21:39:50.734Z]     with_cpu_session(lambda spark : write_func(spark, cpu_path), conf=conf)
[2022-02-10T21:39:50.734Z] ../../src/main/python/spark_session.py:86: in with_cpu_session
[2022-02-10T21:39:50.734Z]     return with_spark_session(func, conf=copy)
[2022-02-10T21:39:50.734Z] ../../src/main/python/spark_session.py:70: in with_spark_session
[2022-02-10T21:39:50.734Z]     ret = func(_spark)
[2022-02-10T21:39:50.734Z] ../../src/main/python/asserts.py:235: in <lambda>
[2022-02-10T21:39:50.734Z]     with_cpu_session(lambda spark : write_func(spark, cpu_path), conf=conf)
[2022-02-10T21:39:50.734Z] ../../src/main/python/parquet_write_test.py:406: in create_empty_df
[2022-02-10T21:39:50.734Z]     return gen_df(spark, gen_list, length=0).write.parquet(path)
[2022-02-10T21:39:50.734Z] /home/jenkins/agent/workspace/jenkins-rapids_integration-pre_release-github-432-311/jars/spark-3.1.1-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/readwriter.py:1249: in parquet
[2022-02-10T21:39:50.734Z]     self._jwrite.parquet(path)
[2022-02-10T21:39:50.734Z] /home/jenkins/agent/workspace/jenkins-rapids_integration-pre_release-github-432-311/jars/spark-3.1.1-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py:1304: in __call__
[2022-02-10T21:39:50.734Z]     return_value = get_return_value(
[2022-02-10T21:39:50.734Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-02-10T21:39:50.734Z] 
[2022-02-10T21:39:50.734Z] a = ('xro82317', <py4j.java_gateway.GatewayClient object at 0x7f066da7af10>, 'o82316', 'parquet')
[2022-02-10T21:39:50.734Z] kw = {}
[2022-02-10T21:39:50.734Z] converted = AnalysisException('path file:/tmp/pyspark_tests/823772/PARQUET_DATA/CPU already exists.', 'org.apache.spark.sql.Analys...:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.lang.Thread.run(Thread.java:748)\n', None)
[2022-02-10T21:39:50.734Z] 
[2022-02-10T21:39:50.734Z]     def deco(*a, **kw):
[2022-02-10T21:39:50.734Z]         try:
[2022-02-10T21:39:50.734Z]             return f(*a, **kw)
[2022-02-10T21:39:50.734Z]         except py4j.protocol.Py4JJavaError as e:
[2022-02-10T21:39:50.734Z]             converted = convert_exception(e.java_exception)
[2022-02-10T21:39:50.734Z]             if not isinstance(converted, UnknownException):
[2022-02-10T21:39:50.734Z]                 # Hide where the exception came from that shows a non-Pythonic
[2022-02-10T21:39:50.734Z]                 # JVM exception message.
[2022-02-10T21:39:50.734Z] >               raise converted from None
[2022-02-10T21:39:50.734Z] E               pyspark.sql.utils.AnalysisException: path file:/tmp/pyspark_tests/823772/PARQUET_DATA/CPU already exists.
[2022-02-10T21:39:50.734Z] 
[2022-02-10T21:39:50.734Z] /home/jenkins/agent/workspace/jenkins-rapids_integration-pre_release-github-432-311/jars/spark-3.1.1-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/utils.py:117: AnalysisException

The text was updated successfully, but these errors were encountered:

jlowe added bug Something isn't working ? - Needs Triage Need team to review and classify labels Feb 10, 2022

jlowe self-assigned this Feb 11, 2022

jlowe added this to the Feb 14 - Feb 25 milestone Feb 11, 2022

sameerz removed the ? - Needs Triage Need team to review and classify label Feb 11, 2022

jlowe mentioned this issue Feb 14, 2022

Decrease chance of random collisions in test temporary paths #4778

Merged

jlowe closed this as completed in #4778 Feb 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] test_write_empty_parquet_round_trip failed #4749

[BUG] test_write_empty_parquet_round_trip failed #4749

jlowe commented Feb 10, 2022

[BUG] test_write_empty_parquet_round_trip failed #4749

[BUG] test_write_empty_parquet_round_trip failed #4749

Comments

jlowe commented Feb 10, 2022