Skip to content

Commit

Permalink
[SPARK-47699][BUILD] Upgrade gcs-connector to 2.2.21 and add a note…
Browse files Browse the repository at this point in the history
… for 3.0.0

This PR aims to upgrade `gcs-connector` to 2.2.21 and add a note for 3.0.0.

This PR aims to upgrade `gcs-connector` to bring the latest bug fixes.

However, due to the following, we stick to use 2.2.21.
- GoogleCloudDataproc/hadoop-connectors#1114
  - `gcs-connector` 2.2.21 has shaded Guava 32.1.2-jre.
    - https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/15c8ee41a15d6735442f36333f1d67792c93b9cf/pom.xml#L100

  - `gcs-connector` 3.0.0 has shaded Guava 31.1-jre.
    - https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/667bf17291dbaa96a60f06df58c7a528bc4a8f79/pom.xml#L97

No.

Manually.
```
$ dev/make-distribution.sh -Phadoop-cloud
$ cd dist
$ export KEYFILE=~/.ssh/apache-spark.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
            -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
            -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
            -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.13.13 (OpenJDK 64-Bit Server VM, Java 21.0.2)
Type in expressions to have them evaluated.
Type :help for more information.
{"ts":"2024-04-02T13:08:31.513-0700","level":"WARN","msg":"Unable to load native-hadoop library for your platform... using builtin-java classes where applicable","logger":"org.apache.hadoop.util.NativeCodeLoader"}
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1712088511841).
Spark session available as 'spark'.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
val res0: Long = 124

scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          NULL|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+
```

No.

Closes apache#45824 from dongjoon-hyun/SPARK-47699.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
  • Loading branch information
dongjoon-hyun authored and Steve Vaughan committed May 12, 2024
1 parent 6fc6766 commit 408a7bf
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 2 deletions.
2 changes: 1 addition & 1 deletion dev/deps/spark-deps-hadoop-3-hive-2.3
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
derby/10.14.2.0//derby-10.14.2.0.jar
dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
flatbuffers-java/1.12.0//flatbuffers-java-1.12.0.jar
gcs-connector/hadoop3-2.2.20/shaded/gcs-connector-hadoop3-2.2.20-shaded.jar
gcs-connector/hadoop3-2.2.21/shaded/gcs-connector-hadoop3-2.2.21-shaded.jar
gmetric4j/1.0.10//gmetric4j-1.0.10.jar
gson/2.2.4//gson-2.2.4.jar
guava/14.0.1//guava-14.0.1.jar
Expand Down
3 changes: 2 additions & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,8 @@
<aws.java.sdk.version>1.11.655</aws.java.sdk.version>
<!-- the producer is used in tests -->
<aws.kinesis.producer.version>0.12.8</aws.kinesis.producer.version>
<gcs-connector.version>hadoop3-2.2.20</gcs-connector.version>
<!-- Do not use 3.0.0: https://github.com/GoogleCloudDataproc/hadoop-connectors/issues/1114 -->
<gcs-connector.version>hadoop3-2.2.21</gcs-connector.version>
<!-- org.apache.httpcomponents/httpclient-->
<commons.httpclient.version>4.5.14</commons.httpclient.version>
<commons.httpcore.version>4.4.16</commons.httpcore.version>
Expand Down

0 comments on commit 408a7bf

Please sign in to comment.