Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49249][SPARK-49122][Connect][SQL] Add addArtifact API to the Spark SQL Core #47631

Closed
wants to merge 14 commits into from

Conversation

xupefei
Copy link
Contributor

@xupefei xupefei commented Aug 6, 2024

What changes were proposed in this pull request?

This PR improves Spark SQL Core by adding a bunch of addArtifact APIs to SparkSession. These APIs were first introduced to Spark Connect a while ago.

The follow-up task is for PySpark.

Why are the changes needed?

To close the API compatibility gap between Spark Connect and Spark Classic.

Does this PR introduce any user-facing change?

Yes, users will be able to use some new APIs.

How was this patch tested?

Added new tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@xupefei xupefei changed the title [SPARK-49122][Connect][SQL] Add addArtifact API to the Spark SQL Core [SPARK-49122][Connect][SQL] Add addArtifact API to the Spark SQL Core Aug 6, 2024
Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look pretty good!

@hvanhovell
Copy link
Contributor

@xupefei can you check if this works with UDFs?

@xupefei xupefei changed the title [SPARK-49122][Connect][SQL] Add addArtifact API to the Spark SQL Core [SPARK-49249][SPARK-49122][Connect][SQL] Add addArtifact API to the Spark SQL Core Aug 15, 2024
@xupefei
Copy link
Contributor Author

xupefei commented Aug 19, 2024

@xupefei can you check if this works with UDFs?

I checked and found it doesn't work. The Spark Core session is currently not using the class loader provided by ArtifactManager:

def getContextOrSparkClassLoader: ClassLoader =
Option(Thread.currentThread().getContextClassLoader).getOrElse(getSparkClassLoader)
// scalastyle:off classforname
/**
* Preferred alternative to Class.forName(className), as well as
* Class.forName(className, initialize, loader) with current thread's ContextClassLoader.
*/
def classForName[C](
className: String,
initialize: Boolean = true,
noSparkClassLoader: Boolean = false): Class[C] = {
if (!noSparkClassLoader) {
Class.forName(className, initialize, getContextOrSparkClassLoader).asInstanceOf[Class[C]]
} else {
Class.forName(className, initialize, Thread.currentThread().getContextClassLoader).
asInstanceOf[Class[C]]
}
// scalastyle:on classforname
}

For now we have to wrap the code with artifactManager.withResources to replace the class loader. THis issue needs to be fixed.

# Conflicts:
#	connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small nit...

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

# Conflicts:
#	sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
@asfgit asfgit closed this in 0863af4 Aug 30, 2024
@xupefei xupefei deleted the core-add-artifact branch August 30, 2024 12:04
asfgit pushed a commit that referenced this pull request Sep 3, 2024
…Artifact` API

### What changes were proposed in this pull request?

This PR is a follow-up of #47631 which adds a docstring to the `addArtifact` API. The doc was missing from the previous PR.

### Why are the changes needed?

The previous PR missed a docstring.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Not needed.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47974 from xupefei/add-artifact-doc.

Authored-by: Paddy Xu <xupaddy@gmail.com>
Signed-off-by: Herman van Hovell <herman@databricks.com>
IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
… Spark SQL Core

### What changes were proposed in this pull request?

This PR improves Spark SQL Core by adding a bunch of `addArtifact` APIs to `SparkSession`. These APIs were first introduced to Spark Connect a while ago.

The follow-up task is for PySpark.

### Why are the changes needed?

To close the API compatibility gap between Spark Connect and Spark Classic.

### Does this PR introduce _any_ user-facing change?

Yes, users will be able to use some new APIs.

### How was this patch tested?

Added new tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47631 from xupefei/core-add-artifact.

Authored-by: Paddy Xu <xupaddy@gmail.com>
Signed-off-by: Herman van Hovell <herman@databricks.com>
IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
…Artifact` API

### What changes were proposed in this pull request?

This PR is a follow-up of apache#47631 which adds a docstring to the `addArtifact` API. The doc was missing from the previous PR.

### Why are the changes needed?

The previous PR missed a docstring.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Not needed.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47974 from xupefei/add-artifact-doc.

Authored-by: Paddy Xu <xupaddy@gmail.com>
Signed-off-by: Herman van Hovell <herman@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants