-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport #672
Comments
@pvchandu thanks for trying the plugin and reporting the issue. I just tried out the instructions again and they are working fine for me. Did you pick the Databricks 7.0 ML runtime? Are you using aws or azure? |
One thing I would suggest doing is just remove the init script from the cluster configuration and make sure that starts up fine and you can run. If that works then there is probably a problem with the init script and perhaps try regenerating it. |
@tgravescs, I tested this out with NC6s_v3 as mentioned in the documentation. It worked well. |
we don't support nodes with multiple GPUs on Databricks right now. The plugin has a restriction that each executor only has 1 GPU and it seems like the last time I tried on Databricks they did not support configuring it to have multiple executors each with 1 GPU on a multi-gpu node. normally in Apache Spark you would set spark.executor.resource.gpu.amount=1 and that would get your 1 gpu per executor but last time I tried that wasn't working on Databricks. Feel free to try to see if anything has changed there. |
That makes sense now. By default, each node is on executor and we cannot change that in Databricks even today. Are there any plans to support multiple GPUs on a single node ? |
we don't have any concrete plans because on any other setup you would just change it to split one node into multiple executors, I'll bring this up to others that this is limitation on Databricks. |
Thanks Thomas. This is pretty limiting on databricks environment given that majority of the users are moving to databricks. I added the following feedback for databricks as well. Will appreciate if you can collaborate with Databricks and figure this story out. |
…IDIA#672) Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com> Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Closing this as the NoClassDefFoundError was resolved and the multiple GPUs per executor request is tracked by #1486. |
I am trying out the new RAPIDS accelerator for Databricks. I am running the mortgate notebook to get started.
I followed the instructions in the documentation https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-with-rapids-accelerator-on-databricks.html.
When I run the code cell to read the data, it is failing with the following error.
Error:
java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
Full Error:
Py4JJavaError Traceback (most recent call last)
command-1671055577733705> in module>
3 # we want a few big files instead of lots of small files
4 spark.conf.set('spark.sql.files.maxPartitionBytes', '200G')
5 acq = read_acq_csv(spark, orig_acq_path)
6 acq.repartition(12).write.parquet(tmp_acq_path, mode='overwrite')
7 perf = read_perf_csv(spark, orig_perf_path)
command-1671055577733703> in read_acq_csv(spark, path)
82 .option('delimiter', '|')
83 .schema(_csv_acq_schema)
84 .load(path)
85 .withColumn('quarter', _get_quarter_from_csv_file_name())
86
/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
176 self.options(**options)
177 if isinstance(path, basestring):
178 return self._df(self._jreader.load(path))
179 elif path is not None:
180 if type(path) != list:
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args)
1303 answer = self.gateway_client.send_command(command)
1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306
1307 for temp_arg in temp_args:
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
126 def deco(*a, **kw):
127 try:
128 return f(*a, **kw)
129 except py4j.protocol.Py4JJavaError as e:
130 converted = convert_exception(e.java_exception)
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
Py4JJavaError: An error occurred while calling o385.load.
: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at com.databricks.backend.daemon.driver.ClassLoaders$ReplWrappingClassLoader.loadClass(ClassLoaders.scala:65)
at java.lang.ClassLoader.loadClass(ClassLoader.java:405)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
at java.util.ServiceLoader$LazyIterator.access$700(ServiceLoader.java:323)
at java.util.ServiceLoader$LazyIterator$2.run(ServiceLoader.java:407)
at java.security.AccessController.doPrivileged(Native Method)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:409)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:255)
at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:249)
at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
at scala.collection.TraversableLike.filter(TraversableLike.scala:347)
at scala.collection.TraversableLike.filter$(TraversableLike.scala:347)
at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:700)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:784)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:317)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.sources.v2.ReadSupport
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 63 more
The text was updated successfully, but these errors were encountered: