Skip to content

Commit

Permalink
Update script to audit multiple Spark versions (#539)
Browse files Browse the repository at this point in the history
* Update audit script to support multiple Spark versions

* Update readme and validation script

* addressed review comments

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

* update pom

* empty commit

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

* addressed review comments

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

* addresed review comments

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
  • Loading branch information
nartal1 authored Aug 13, 2020
1 parent ae4b00d commit afbb613
Show file tree
Hide file tree
Showing 4 changed files with 71 additions and 6 deletions.
10 changes: 8 additions & 2 deletions api_validation/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# API validation script for Rapids Plugin

API validation script checks the compatibility of community Spark Execs and GPU Execs in the Rapids Plugin for Spark.
For example: HashAggregateExec with GpuHashAggregateExec. The script prints Execs where validation fails.
For example: HashAggregateExec with GpuHashAggregateExec.
Script can be used to audit different versions of Spark(3.0.0, 3.0.1-SNAPSHOT and 3.1.0-SNAPSHOT)
The script prints Execs where validation fails.
Validation fails when:
1) The number of parameters differ between community Spark Execs and Gpu Execs.
2) Parameters to the exec have a type mismatch.
Expand All @@ -15,7 +17,11 @@ It requires cudf, rapids-4-spark and spark jars.

```
cd api_validation
mvn scala:run
// To run validation script on all version of Spark(3.0.0, 3.0.1-SNAPSHOT and 3.1.0-SNAPSHOT)
sh auditAllVersions.sh
// To run script on particular version we can use profile(spark300, spark301 and spark310)
mvn scala:run -P spark300
```

# Output
Expand Down
19 changes: 19 additions & 0 deletions api_validation/auditAllVersions.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash
# Copyright (c) 2020, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
set -ex

mvn scala:run -P spark300
mvn scala:run -P spark301
mvn scala:run -P spark310
28 changes: 28 additions & 0 deletions api_validation/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,27 @@
<artifactId>rapids-4-spark-api-validation</artifactId>
<version>0.2.0-SNAPSHOT</version>

<profiles>
<profile>
<id>spark300</id>
<properties>
<spark.version>${spark300.version}</spark.version>
</properties>
</profile>
<profile>
<id>spark301</id>
<properties>
<spark.version>${spark301.version}</spark.version>
</properties>
</profile>
<profile>
<id>spark310</id>
<properties>
<spark.version>${spark310.version}</spark.version>
</properties>
</profile>
</profiles>

<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
Expand All @@ -35,6 +56,7 @@
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
Expand All @@ -53,6 +75,12 @@
<version>${project.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-shims-aggregator_${scala.binary.version}</artifactId>
<version>0.2.0-SNAPSHOT</version>
<scope>provided</scope>
</dependency>
</dependencies>

<build>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,16 @@ object ApiValidation extends Logging {
val gpuKeys = gpuExecs.keys
var printNewline = false

val sparkToShimMap = Map("3.0.0" -> "spark300", "3.0.1" -> "spark301", "3.1.0" -> "spark310")
val sparkVersion = ShimLoader.getSparkShims.getSparkShimVersion.toString
var shimVersion = sparkToShimMap(sparkVersion)
// There is no separate implementation for Execs in spark-3.0.1.
shimVersion = if (shimVersion == "spark301") {
"spark300"
} else {
shimVersion
}

gpuKeys.foreach { e =>
// Get SparkExecs argNames and types
val sparkTypes = classToTypeTag(e)
Expand All @@ -83,13 +93,15 @@ object ApiValidation extends Logging {
val execType = sparkTypes.tpe.toString.split('.').last
val gpu = execType match {
case "BroadcastExchangeExec" => s"org.apache.spark.sql.rapids.execution.Gpu" + execType
case "BroadcastHashJoinExec" => s"com.nvidia.spark.rapids.shims.spark300.Gpu" + execType
case "FileSourceScanExec" => s"org.apache.spark.sql.rapids.shims.spark300.Gpu" + execType
case "BroadcastHashJoinExec" => s"com.nvidia.spark.rapids.shims." + shimVersion +
".Gpu" + execType
case "FileSourceScanExec" => s"org.apache.spark.sql.rapids.shims." + shimVersion +
".Gpu" + execType
case "CartesianProductExec" => s"org.apache.spark.sql.rapids.Gpu" + execType
case "BroadcastNestedLoopJoinExec" =>
s"com.nvidia.spark.rapids.shims.spark300.Gpu" + execType
s"com.nvidia.spark.rapids.shims." + shimVersion + ".Gpu" + execType
case "SortMergeJoinExec" | "ShuffledHashJoinExec" =>
s"com.nvidia.spark.rapids.shims.spark300.GpuShuffledHashJoinExec"
s"com.nvidia.spark.rapids.shims." + shimVersion + ".GpuShuffledHashJoinExec"
case "SortAggregateExec" => s"com.nvidia.spark.rapids.GpuHashAggregateExec"
case _ => s"com.nvidia.spark.rapids.Gpu" + execType
}
Expand Down

0 comments on commit afbb613

Please sign in to comment.