Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.3.3
-
None
-
None
Description
scala> import org.apache.hadoop.fs.Path import org.apache.hadoop.fs.Path scala> val path: Path = new Path("gs://test_dd123/") path: org.apache.hadoop.fs.Path = gs://test_dd123/ scala> path.suffix("/num=123") java.lang.NullPointerException at org.apache.hadoop.fs.Path.<init>(Path.java:150) at org.apache.hadoop.fs.Path.<init>(Path.java:129) at org.apache.hadoop.fs.Path.suffix(Path.java:450)
Path.suffix throws NPE when writing into GS buckets root.
In our Organisation, we are using GCS bucket root location to point to our Hive table. Dataproc's latest 2.1 uses Hadoop 3.3.3 and this needs to be fixed in 3.3.3.
Spark Scala code to reproduce this issue
val DF = Seq(("test1", 123)).toDF("name", "num") DF.write.option("path", "gs://test_dd123/").mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("schema_name.table_name") val DF1 = Seq(("test2", 125)).toDF("name", "num") DF1.write.mode(SaveMode.Overwrite).format("orc").insertInto("schema_name.table_name") java.lang.NullPointerException at org.apache.hadoop.fs.Path.<init>(Path.java:141) at org.apache.hadoop.fs.Path.<init>(Path.java:120) at org.apache.hadoop.fs.Path.suffix(Path.java:441) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.$anonfun$getCustomPartitionLocations$1(InsertIntoHadoopFsRelationCommand.scala:254)
Attachments
Issue Links
- depends upon
-
HADOOP-18652 Path.suffix raises NullPointerException
- Resolved
-
MAPREDUCE-7452 ManifestCommitter to support / as a destination
- Open
- is duplicated by
-
SPARK-44883 Spark insertInto with location GCS bucket root causes NPE
- Resolved