Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
3.3.0
-
None
-
None
Description
In our Organisation, we are using GCS bucket root location to point to our Hive table. Dataproc's latest 2.1 uses Spark 3.3.0 and this needs to be fixed.
Spark Scala code to reproduce this issue
val DF = Seq(("test1", 123)).toDF("name", "num") DF.write.option("path", "gs://test_dd123/").mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("schema_name.table_name") val DF1 = Seq(("test2", 125)).toDF("name", "num") DF.write.mode(SaveMode.Overwrite).format("orc").insertInto("schema_name.table_name") java.lang.NullPointerException at org.apache.hadoop.fs.Path.<init>(Path.java:141) at org.apache.hadoop.fs.Path.<init>(Path.java:120) at org.apache.hadoop.fs.Path.suffix(Path.java:441) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.$anonfun$getCustomPartitionLocations$1(InsertIntoHadoopFsRelationCommand.scala:254)
Looks like the issue is coming from Hadoop Path.
scala> import org.apache.hadoop.fs.Path import org.apache.hadoop.fs.Path scala> val path: Path = new Path("gs://test_dd123/") path: org.apache.hadoop.fs.Path = gs://test_dd123/ scala> path.suffix("/num=123") java.lang.NullPointerException at org.apache.hadoop.fs.Path.<init>(Path.java:150) at org.apache.hadoop.fs.Path.<init>(Path.java:129) at org.apache.hadoop.fs.Path.suffix(Path.java:450)
Path.suffix throughs NPE when writing into GS buckets root.
Attachments
Issue Links
- duplicates
-
HADOOP-18856 Spark insertInto with location GCS bucket root not supported
- Open