Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] qa_nightly_select_test.py::test_select FAILED on the Dataproc Cluster #2230

Closed
NvTimLiu opened this issue Apr 22, 2021 · 2 comments
Closed
Labels
bug Something isn't working

Comments

@NvTimLiu
Copy link
Collaborator

Describe the bug
=================================== FAILURES ===================================
_______________ test_select[REGEXP_REPLACE(strF, 'Yu', 'Eric')] ________________

sql_query_line = ("SELECT REGEXP_REPLACE(strF, 'Yu', 'Eric') FROM test_table", "REGEXP_REPLACE(strF, 'Yu', 'Eric')")
pytestconfig = <_pytest.config.Config object at 0x7efca578e7c0>

  @approximate_float
  @incompat
  @qarun
  @pytest.mark.parametrize(sql_query_line', SELECT_SQL, ids=idfn)
  def test_select(sql_query_line, pytestconfig):
      sql_query = sql_query_line[0]
      if sql_query:
          print(sql_query)
          with_cpu_session(num_stringDf)
      assert_gpu_and_cpu_are_equal_collect(lambda spark: spark.sql(sql_query), conf=_qa_conf)

integration_tests/src/main/python/qa_nightly_select_test.py:167:


integration_tests/src/main/python/asserts.py:360: in assert_gpu_and_cpu_are_equal_collect
_assert_gpu_and_cpu_are_equal(func, COLLECT', conf=conf, is_cpu_first=is_cpu_first)
integration_tests/src/main/python/asserts.py:341: in _assert_gpu_and_cpu_are_equal
run_on_gpu()
integration_tests/src/main/python/asserts.py:334: in run_on_gpu
from_gpu = with_gpu_session(bring_back,
integration_tests/src/main/python/spark_session.py:95: in with_gpu_session
return with_spark_session(func, conf=copy)
integration_tests/src/main/python/spark_session.py:68: in with_spark_session
ret = func(_spark)
integration_tests/src/main/python/asserts.py:179: in
bring_back = lambda spark: limit_func(spark).collect()
/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1619083825114_0001/container_e01_1619083825114_0001_01_000001/pyspark.zip/pyspark/sql/dataframe.py:677: in collect
sock_info = self._jdf.collectToPython()
/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1619083825114_0001/container_e01_1619083825114_0001_01_000001/py4j-0.10.9-src.zip/py4j/java_gateway.py:1304: in call
return_value = get_return_value(
/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1619083825114_0001/container_e01_1619083825114_0001_01_000001/pyspark.zip/pyspark/sql/utils.py:111: in deco
return f(*a, **kw)


answer = 'xro1134345'
gateway_client = <py4j.java_gateway.GatewayClient object at 0x7efca53d6040>
target_id = 'o1134344', name = 'collectToPython'

  def get_return_value(answer, gateway_client, target_id=None, name=None):
    """Converts an answer received from the Java gateway into a Python object.

For example, string representation of integers are converted to Python
integer, string representation of objects are converted to JavaObject
instances, etc.

:param answer: the string returned by the Java gateway
:param gateway_client: the gateway client used to communicate with the Java
    Gateway. Only necessary if the answer is a reference (e.g., object,
    list, map)
:param target_id: the name of the object from which the answer comes from
    (e.g., *object1* in `object1.hello()`). Optional.
:param name: the name of the member from which the answer comes from
    (e.g., *hello* in `object1.hello()`). Optional.
"""
      if is_error(answer)[0]:
          if len(answer) > 1:
              type = answer[1]
              value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
              if answer[1] == REFERENCE_TYPE:
               raise Py4JJavaError(
                      "An error occurred while calling {0}{1}{2}.\n".
                      format(target_id, ".", name), value)
              py4j.protocol.Py4JJavaError: An error occurred while calling o1134344.collectToPython.
              : scala.MatchError: List(strF#496130, Yu, Eric, 1) (of class scala.collection.immutable.$colon$colon)
              	at com.nvidia.spark.rapids.shims.spark311.Spark311Shims$$anon$4.convertToGpu(Spark311Shims.scala:194)
              	at com.nvidia.spark.rapids.shims.spark311.Spark311Shims$$anon$4.convertToGpu(Spark311Shims.scala:180)
              	at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:805)
              	at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:797)
              	at com.nvidia.spark.rapids.GpuOverrides$$anon$152.$anonfun$convertToGpu$19(GpuOverrides.scala:2686)
              	at scala.collection.immutable.Stream.map(Stream.scala:418)
              	at com.nvidia.spark.rapids.GpuOverrides$$anon$152.convertToGpu(GpuOverrides.scala:2686)
              	at com.nvidia.spark.rapids.GpuOverrides$$anon$152.convertToGpu(GpuOverrides.scala:2683)
              	at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:642)
              	at com.nvidia.spark.rapids.GpuOverrides.applyOverrides(GpuOverrides.scala:3050)
              	at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:3012)
              	at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:2998)
              	at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1(Columnar.scala:532)
              	at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1$adapted(Columnar.scala:531)
              	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
              	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
              	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
              	at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:531)
              	at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:495)
              	at org.apache.spark.sql.execution.QueryExecution$.$anonfun$prepareForExecution$1(QueryExecution.scala:372)
              	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
              	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
              	at scala.collection.immutable.List.foldLeft(List.scala:91)
              	at org.apache.spark.sql.execution.QueryExecution$.prepareForExecution(QueryExecution.scala:371)
              	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:117)
              	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
              	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:143)
              	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
              	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:143)
              	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:117)
              	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:110)
              	at org.apache.spark.sql.execution.QueryExecution.$anonfun$simpleString$2(QueryExecution.scala:161)
              	at org.apache.spark.sql.execution.ExplainUtils$.processPlan(ExplainUtils.scala:115)
              	at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:161)
              	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:206)
              	at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:175)
              	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:98)
              	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
              	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
              	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
              	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
              	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
              	at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3516)
              	at sun.reflect.GeneratedMethodAccessor77.invoke(Unknown Source)
              	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              	at java.lang.reflect.Method.invoke(Method.java:498)
              	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
              	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
              	at py4j.Gateway.invoke(Gateway.java:282)
              	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
              	at py4j.commands.CallCommand.execute(CallCommand.java:79)
              	at py4j.GatewayConnection.run(GatewayConnection.java:238)
              	at java.lang.Thread.run(Thread.java:748)

/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1619083825114_0001/container_e01_1619083825114_0001_01_000001/py4j-0.10.9-src.zip/py4j/protocol.py:326: Py4JJavaError

Steps/Code to reproduce bug
Run pytests on Dataproc cluster with the script https://github.com/NVIDIA/spark-rapids/blob/branch-0.5/integration_tests/run_pyspark_from_build.sh#L129~L131

Expected behavior
All the python tests PASS

Environment details (please complete the following information)

  • Environment location: [Dataproc cluster]
@NvTimLiu NvTimLiu added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 22, 2021
@NvTimLiu NvTimLiu changed the title [BUG] qa_nightly_select_test.py::test_select FAILED on the EMR Cluster [BUG] qa_nightly_select_test.py::test_select FAILED on the Dataproc Cluster Apr 22, 2021
@NvTimLiu
Copy link
Collaborator Author

This should be fixed by #2218.

@jlowe
Copy link
Member

jlowe commented Apr 22, 2021

Closing as a duplicate of #2217

@jlowe jlowe closed this as completed Apr 22, 2021
@jlowe jlowe removed the ? - Needs Triage Need team to review and classify label Apr 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants