Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make shims.v2.ParquetCachedBatchSerializer as protected #4560

Merged
merged 3 commits into from
Jan 21, 2022

Conversation

nartal1
Copy link
Collaborator

@nartal1 nartal1 commented Jan 19, 2022

This fixes #4022.
Making com.nvidia.spark.rapids.shims.v2.ParquetCachedBatchSerializer as protected since that class is not configurable by SparkConf/RapidsConf.

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@nartal1 nartal1 added the bug Something isn't working label Jan 19, 2022
@nartal1 nartal1 added this to the Jan 10 - Jan 28 milestone Jan 19, 2022
@nartal1 nartal1 self-assigned this Jan 19, 2022
@nartal1
Copy link
Collaborator Author

nartal1 commented Jan 19, 2022

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on the goal of this PR. IIUC, marking these as Scala protected is only going to enforce something at compile time, not runtime. I don't think it is going to "hide" them in the resulting jar, and they would still be directly accessible if this were a normal jar. Therefore I don't think this accomplishes anything if we're worried about users inadvertently or directly accessing these classes in the jar if the dist jar was a normal jar.

But the dist jar is not a normal jar. Almost all classes are "hidden" in the "parallel-world" classloader setup. Users that attempt to compile against this jar will not be able to find classes in their normal package locations, and users asking for these classes on the spark command-line will similarly not be able to locate them at their expected locations. Only the truly public classes, of which there are very few, are placed in their expected locations in the jar and therefore are directly accessible.

@gerashegalov
Copy link
Collaborator

The goal of #4022 is to eliminate a source of confusion for developers of spark-rapids internals, inspired by some discussions with @razajafri . The current PR description is not quite accurate. If we go ahead with this, the PR should mark the ancestor classes / traits or otherwise internal classes that are not settable via config differently than public classes. I think the point @jlowe's point about runtime implications of the parallel worlds jar is valid but it is orthogonal to the issue.

@@ -444,7 +444,7 @@ trait VisibleShuffleManager {
* @param conf
* @param isDriver
*/
abstract class ProxyRapidsShuffleInternalManagerBase(
protected abstract class ProxyRapidsShuffleInternalManagerBase(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it. I had misunderstood the bug.

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@nartal1
Copy link
Collaborator Author

nartal1 commented Jan 21, 2022

@jlowe, @razajafri and @gerashegalov . I didn't understand the bug properly earlier and made those changes. I have updated the PR description. Please take another look.

razajafri
razajafri previously approved these changes Jan 21, 2022
@nartal1 nartal1 changed the title Make shims.v2.ParquetCachedBatchSerializer and similar classes as protected Make shims.v2.ParquetCachedBatchSerializer as protected Jan 21, 2022
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@jlowe
Copy link
Member

jlowe commented Jan 21, 2022

build

@razajafri
Copy link
Collaborator

@nartal1 follow these steps to never having to worry about updating the copyrights manually

https://github.com/NVIDIA/spark-rapids/blob/branch-22.02/CONTRIBUTING.md#pre-commit-hooks

@nartal1 nartal1 merged commit a409d0e into NVIDIA:branch-22.04 Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants