Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] test_group_apply_udf and test_group_apply_udf_more_types hangs on Databricks 9.1 #4599

Closed
jlowe opened this issue Jan 21, 2022 · 1 comment · Fixed by #4618
Closed
Assignees
Labels
bug Something isn't working

Comments

@jlowe
Copy link
Member

jlowe commented Jan 21, 2022

Running test_group_apply_udf or test_group_apply_udf_more_types hangs in the Databricks 9.1 environment. There's no CPU utilization, so it is not an infinite loop. From a stack trace, it appears the code is waiting for data from Python that never arrives:

"Executor task launch worker for task 1.0 in stage 6.0 (TID 45)" #79 daemon prio=5 os_prio=0 tid=0x00007fab6c140000 nid=0x5876 runnable [0x00007faaa43be000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
	- locked <0x00000000fa10ad40> (a java.io.BufferedInputStream)
	at java.io.DataInputStream.readInt(DataInputStream.java:387)
	at org.apache.spark.sql.rapids.execution.python.GpuPythonArrowOutput$$anon$1.read(GpuArrowEvalPythonExec.scala:328)
	at org.apache.spark.sql.rapids.execution.python.GpuPythonArrowOutput$$anon$1.read(GpuArrowEvalPythonExec.scala:285)
@jlowe jlowe added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jan 21, 2022
@firestarman
Copy link
Collaborator

firestarman commented Jan 24, 2022

There is a DB specific config spark.databricks.execution.pandasZeroConfConversion.groupbyApply.enabled which is false by default. The test can pass after setting this config to true.

Seems DB 9.1 supports to disable this 'zero-conf-conversion' feature and has it disabled by default. While the plugin is missing the support of disabling it.
That is to say, the correct Python runner (grouped python runner or base arrow python runner) should be picked according to this config when being created in the GpuFlatMapGroupInPandas operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants