Skip to content

Commit

Permalink
[SPARK-49066][SQL][TESTS] Refactor OrcEncryptionSuite and make `spa…
Browse files Browse the repository at this point in the history
…rk.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite`

### What changes were proposed in this pull request?
This pr moves the global scope test configuration `spark.hadoop.hadoop.security.key.provider.path`, which is configured in the parent `pom.xml` and `SparkBuild.scala`, to `OrcEncryptionSuite` to ensure that it is effective only within `OrcEncryptionSuite`.

To achieve this, the pr also refactors `OrcEncryptionSuite`:
1. Overrides `beforeAll` to back up the contents of `CryptoUtils#keyProviderCache`.
2. Overrides `afterAll` to restore the contents of `CryptoUtils#keyProviderCache`.

This ensures that `CryptoUtils#keyProviderCache` is isolated during the test process of `OrcEncryptionSuite`.

### Why are the changes needed?
The test configuration `spark.hadoop.hadoop.security.key.provider.path` in the parent `pom.xml` and `SparkBuild.scala` is effective globally, which leads to the possibility that other Orc writing test cases, besides `OrcEncryptionSuite`, might also be affected by this configuration and use `test.org.apache.spark.sql.execution.datasources.orc.FakeKeyProvider.Factory`。

### Does this PR introduce _any_ user-facing change?
No, just for test.

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#47543 from LuciferYang/SPARK-49066.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
  • Loading branch information
LuciferYang authored and dongjoon-hyun committed Jul 31, 2024
1 parent 8b826ab commit 06ed91a
Show file tree
Hide file tree
Showing 3 changed files with 34 additions and 5 deletions.
3 changes: 0 additions & 3 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3105,7 +3105,6 @@
<spark.ui.showConsoleProgress>false</spark.ui.showConsoleProgress>
<spark.unsafe.exceptionOnMemoryLeak>true</spark.unsafe.exceptionOnMemoryLeak>
<spark.memory.debugFill>true</spark.memory.debugFill>
<spark.hadoop.hadoop.security.key.provider.path>test:///</spark.hadoop.hadoop.security.key.provider.path>
<!-- Needed by sql/hive tests. -->
<test.src.tables>src</test.src.tables>
<hive.conf.validation>false</hive.conf.validation>
Expand Down Expand Up @@ -3162,8 +3161,6 @@
<spark.test.docker.removePulledImage>${spark.test.docker.removePulledImage}</spark.test.docker.removePulledImage>
<!-- Needed by sql/hive tests. -->
<test.src.tables>__not_used__</test.src.tables>
<!--SPARK-42934: Need by `OrcEncryptionSuite` -->
<spark.hadoop.hadoop.security.key.provider.path>test:///</spark.hadoop.hadoop.security.key.provider.path>
</systemProperties>
<tagsToExclude>${test.exclude.tags},${test.default.exclude.tags}</tagsToExclude>
<tagsToInclude>${test.include.tags}</tagsToInclude>
Expand Down
1 change: 0 additions & 1 deletion project/SparkBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -1599,7 +1599,6 @@ object TestSettings {
(Test / javaOptions) += "-Dspark.ui.enabled=false",
(Test / javaOptions) += "-Dspark.ui.showConsoleProgress=false",
(Test / javaOptions) += "-Dspark.unsafe.exceptionOnMemoryLeak=true",
(Test / javaOptions) += "-Dspark.hadoop.hadoop.security.key.provider.path=test:///",
(Test / javaOptions) += "-Dhive.conf.validation=false",
(Test / javaOptions) += "-Dsun.io.serialization.extendedDebugInfo=false",
(Test / javaOptions) += "-Dderby.system.durability=test",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,20 +17,53 @@

package org.apache.spark.sql.execution.datasources.orc

import java.lang.invoke.MethodHandles
import java.util.{Map => JMap}
import java.util.Random

import org.apache.orc.impl.HadoopShimsFactory
import scala.collection.mutable

import org.apache.orc.impl.{CryptoUtils, HadoopShimsFactory, KeyProvider}

import org.apache.spark.SparkConf
import org.apache.spark.sql.Row
import org.apache.spark.sql.test.SharedSparkSession

class OrcEncryptionSuite extends OrcTest with SharedSparkSession {
import testImplicits._

override def sparkConf: SparkConf = {
super.sparkConf.set("spark.hadoop.hadoop.security.key.provider.path", "test:///")
}

override def beforeAll(): Unit = {
// Backup `CryptoUtils#keyProviderCache` and clear it.
keyProviderCacheRef.entrySet()
.forEach(e => keyProviderCacheBackup.put(e.getKey, e.getValue))
keyProviderCacheRef.clear()
super.beforeAll()
}

override def afterAll(): Unit = {
super.afterAll()
// Restore `CryptoUtils#keyProviderCache`.
keyProviderCacheRef.clear()
keyProviderCacheBackup.foreach { case (k, v) => keyProviderCacheRef.put(k, v) }
}

val originalData = Seq(("123456789", "dongjoon@apache.org", "Dongjoon Hyun"))
val rowDataWithoutKey =
Row(null, "841626795E7D351555B835A002E3BF10669DE9B81C95A3D59E10865AC37EA7C3", "Dongjoon Hyun")

private val keyProviderCacheBackup: mutable.Map[String, KeyProvider] = mutable.Map.empty

private val keyProviderCacheRef: JMap[String, KeyProvider] = {
val clazz = classOf[CryptoUtils]
val lookup = MethodHandles.privateLookupIn(clazz, MethodHandles.lookup())
lookup.findStaticVarHandle(clazz, "keyProviderCache", classOf[JMap[_, _]])
.get().asInstanceOf[JMap[String, KeyProvider]]
}

test("Write and read an encrypted file") {
val conf = spark.sessionState.newHadoopConf()
val provider = HadoopShimsFactory.get.getHadoopKeyProvider(conf, new Random)
Expand Down

0 comments on commit 06ed91a

Please sign in to comment.