[BUG] test_spark_from_json_date_with_format FAILED on : Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec #10559

NvTimLiu · 2024-03-07T01:17:45Z

Describe the bug

test_spark_from_json_date_with_format FAILED on : Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec,

Related PR: #10490

 FAILED ../../src/main/python/json_test.py::test_spark_from_json_date_with_format[DATAGEN_SEED=1709762464, INJECT_OOM] - pyspark.sql.utils.IllegalArgumentException: Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec


=================================== FAILURES ===================================
____________________ test_spark_from_json_date_with_format _____________________
[gw3] linux -- Python 3.9.18 /opt/conda/bin/python

    @pytest.mark.skipif(is_before_spark_320(), reason="only dd/MM/yyyy is supported prior to 3.2.0")
    def test_spark_from_json_date_with_format():
        data = [["""{"time": "26/08/2015"}"""]]
        schema = StructType([StructField("d", DateType())])
>       assert_gpu_and_cpu_are_equal_collect(
                lambda spark : spark.createDataFrame(data, 'json STRING').select(f.col('json'), f.from_json(f.col('json'), schema, {'dateFormat': 'dd/MM/yyyy'})),
            conf = { 'spark.rapids.sql.expression.JsonToStructs': True })

../../src/main/python/json_test.py:1335:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../src/main/python/asserts.py:595: in assert_gpu_and_cpu_are_equal_collect
    _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
../../src/main/python/asserts.py:503: in _assert_gpu_and_cpu_are_equal
    from_gpu = run_on_gpu()
../../src/main/python/asserts.py:496: in run_on_gpu
    from_gpu = with_gpu_session(bring_back, conf=conf)
../../src/main/python/spark_session.py:164: in with_gpu_session
    return with_spark_session(func, conf=copy)
/opt/conda/lib/python3.9/contextlib.py:79: in inner
    return func(*args, **kwds)
../../src/main/python/spark_session.py:131: in with_spark_session
    ret = func(_spark)
../../src/main/python/asserts.py:205: in <lambda>
    bring_back = lambda spark: limit_func(spark).collect()
../../../spark-3.3.0-bin-hadoop3/python/pyspark/sql/dataframe.py:817: in collect
    sock_info = self._jdf.collectToPython()
/home/jenkins/agent/workspace/jenkins-rapids_it-non-utc-dev-71/jars/spark-3.3.0-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1321: in __call__
    return_value = get_return_value(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

a = ('xro1012184', <py4j.clientserver.JavaClient object at 0x7f0924f23be0>, 'o1012183', 'collectToPython')
kw = {}, converted = IllegalArgumentException()

    def deco(*a: Any, **kw: Any) -> Any:
        try:
            return f(*a, **kw)
        except Py4JJavaError as e:
            converted = convert_exception(e.java_exception)
            if not isinstance(converted, UnknownException):
                # Hide where the exception came from that shows a non-Pythonic
                # JVM exception message.
>               raise converted from None
E               pyspark.sql.utils.IllegalArgumentException: Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec
E               Project [json#111935, from_json(StructField(d,DateType,true), (dateFormat,dd/MM/yyyy), json#111935, Some(Canada/Newfoundland)) AS from_json(json)#111937]
E               +- Scan ExistingRDD[json#111935]

../../../spark-3.3.0-bin-hadoop3/python/pyspark/sql/utils.py:196: IllegalArgumentException
----------------------------- Captured stdout call -----------------------------
### CPU RUN ###

The text was updated successfully, but these errors were encountered:

revans2 · 2024-03-07T15:13:23Z

looks like I missed adding in a @allow_non_gpu(*non_utc_allow) for the test. It would really be nice if we could list in the issue what version of Spark the test was run against and if it was running in a different time zone, what that time zone was. Otherwise it is very difficult to reproduce the issue. Perhaps I should just add it to the name of the tests.

NvTimLiu added bug Something isn't working ? - Needs Triage Need team to review and classify labels Mar 7, 2024

NvTimLiu assigned revans2 Mar 7, 2024

revans2 removed the ? - Needs Triage Need team to review and classify label Mar 7, 2024

revans2 mentioned this issue Mar 7, 2024

Fix test_spark_from_json_date_with_format when run in a non-UTC TZ #10562

Merged

gerashegalov closed this as completed in #10562 Mar 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] test_spark_from_json_date_with_format FAILED on : Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec #10559

[BUG] test_spark_from_json_date_with_format FAILED on : Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec #10559

NvTimLiu commented Mar 7, 2024

revans2 commented Mar 7, 2024

[BUG] test_spark_from_json_date_with_format FAILED on : Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec #10559

[BUG] test_spark_from_json_date_with_format FAILED on : Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec #10559

Comments

NvTimLiu commented Mar 7, 2024

revans2 commented Mar 7, 2024