-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make shims.v2.ParquetCachedBatchSerializer as protected #4560
Make shims.v2.ParquetCachedBatchSerializer as protected #4560
Conversation
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on the goal of this PR. IIUC, marking these as Scala protected is only going to enforce something at compile time, not runtime. I don't think it is going to "hide" them in the resulting jar, and they would still be directly accessible if this were a normal jar. Therefore I don't think this accomplishes anything if we're worried about users inadvertently or directly accessing these classes in the jar if the dist jar was a normal jar.
But the dist jar is not a normal jar. Almost all classes are "hidden" in the "parallel-world" classloader setup. Users that attempt to compile against this jar will not be able to find classes in their normal package locations, and users asking for these classes on the spark command-line will similarly not be able to locate them at their expected locations. Only the truly public classes, of which there are very few, are placed in their expected locations in the jar and therefore are directly accessible.
sql-plugin/src/main/scala/com/nvidia/spark/ExclusiveModeGpuDiscoveryPlugin.scala
Outdated
Show resolved
Hide resolved
The goal of #4022 is to eliminate a source of confusion for developers of spark-rapids internals, inspired by some discussions with @razajafri . The current PR description is not quite accurate. If we go ahead with this, the PR should mark the ancestor classes / traits or otherwise internal classes that are not settable via config differently than public classes. I think the point @jlowe's point about runtime implications of the parallel worlds jar is valid but it is orthogonal to the issue. |
sql-plugin/src/main/scala/com/nvidia/spark/ExclusiveModeGpuDiscoveryPlugin.scala
Outdated
Show resolved
Hide resolved
@@ -444,7 +444,7 @@ trait VisibleShuffleManager { | |||
* @param conf | |||
* @param isDriver | |||
*/ | |||
abstract class ProxyRapidsShuffleInternalManagerBase( | |||
protected abstract class ProxyRapidsShuffleInternalManagerBase( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it. I had misunderstood the bug.
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@jlowe, @razajafri and @gerashegalov . I didn't understand the bug properly earlier and made those changes. I have updated the PR description. Please take another look. |
...s-spark310+/src/test/scala/com/nvidia/spark/rapids/shims/v2/Spark310ParquetWriterSuite.scala
Show resolved
Hide resolved
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
build |
@nartal1 follow these steps to never having to worry about updating the copyrights manually https://github.com/NVIDIA/spark-rapids/blob/branch-22.02/CONTRIBUTING.md#pre-commit-hooks |
This fixes #4022.
Making com.nvidia.spark.rapids.shims.v2.ParquetCachedBatchSerializer as protected since that class is not configurable by SparkConf/RapidsConf.