Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] test_spark_from_json_date_with_format FAILED on : Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec #10559

Closed
NvTimLiu opened this issue Mar 7, 2024 · 1 comment · Fixed by #10562
Assignees
Labels
bug Something isn't working

Comments

@NvTimLiu
Copy link
Collaborator

NvTimLiu commented Mar 7, 2024

Describe the bug

test_spark_from_json_date_with_format FAILED on : Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec,

Related PR: #10490

 FAILED ../../src/main/python/json_test.py::test_spark_from_json_date_with_format[DATAGEN_SEED=1709762464, INJECT_OOM] - pyspark.sql.utils.IllegalArgumentException: Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec


=================================== FAILURES ===================================
____________________ test_spark_from_json_date_with_format _____________________
[gw3] linux -- Python 3.9.18 /opt/conda/bin/python

    @pytest.mark.skipif(is_before_spark_320(), reason="only dd/MM/yyyy is supported prior to 3.2.0")
    def test_spark_from_json_date_with_format():
        data = [["""{"time": "26/08/2015"}"""]]
        schema = StructType([StructField("d", DateType())])
>       assert_gpu_and_cpu_are_equal_collect(
                lambda spark : spark.createDataFrame(data, 'json STRING').select(f.col('json'), f.from_json(f.col('json'), schema, {'dateFormat': 'dd/MM/yyyy'})),
            conf = { 'spark.rapids.sql.expression.JsonToStructs': True })

../../src/main/python/json_test.py:1335:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../src/main/python/asserts.py:595: in assert_gpu_and_cpu_are_equal_collect
    _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
../../src/main/python/asserts.py:503: in _assert_gpu_and_cpu_are_equal
    from_gpu = run_on_gpu()
../../src/main/python/asserts.py:496: in run_on_gpu
    from_gpu = with_gpu_session(bring_back, conf=conf)
../../src/main/python/spark_session.py:164: in with_gpu_session
    return with_spark_session(func, conf=copy)
/opt/conda/lib/python3.9/contextlib.py:79: in inner
    return func(*args, **kwds)
../../src/main/python/spark_session.py:131: in with_spark_session
    ret = func(_spark)
../../src/main/python/asserts.py:205: in <lambda>
    bring_back = lambda spark: limit_func(spark).collect()
../../../spark-3.3.0-bin-hadoop3/python/pyspark/sql/dataframe.py:817: in collect
    sock_info = self._jdf.collectToPython()
/home/jenkins/agent/workspace/jenkins-rapids_it-non-utc-dev-71/jars/spark-3.3.0-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1321: in __call__
    return_value = get_return_value(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

a = ('xro1012184', <py4j.clientserver.JavaClient object at 0x7f0924f23be0>, 'o1012183', 'collectToPython')
kw = {}, converted = IllegalArgumentException()

    def deco(*a: Any, **kw: Any) -> Any:
        try:
            return f(*a, **kw)
        except Py4JJavaError as e:
            converted = convert_exception(e.java_exception)
            if not isinstance(converted, UnknownException):
                # Hide where the exception came from that shows a non-Pythonic
                # JVM exception message.
>               raise converted from None
E               pyspark.sql.utils.IllegalArgumentException: Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec
E               Project [json#111935, from_json(StructField(d,DateType,true), (dateFormat,dd/MM/yyyy), json#111935, Some(Canada/Newfoundland)) AS from_json(json)#111937]
E               +- Scan ExistingRDD[json#111935]

../../../spark-3.3.0-bin-hadoop3/python/pyspark/sql/utils.py:196: IllegalArgumentException
----------------------------- Captured stdout call -----------------------------
### CPU RUN ###
@NvTimLiu NvTimLiu added bug Something isn't working ? - Needs Triage Need team to review and classify labels Mar 7, 2024
@revans2
Copy link
Collaborator

revans2 commented Mar 7, 2024

looks like I missed adding in a @allow_non_gpu(*non_utc_allow) for the test. It would really be nice if we could list in the issue what version of Spark the test was run against and if it was running in a different time zone, what that time zone was. Otherwise it is very difficult to reproduce the issue. Perhaps I should just add it to the name of the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants