Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
3.0.0
-
None
Description
While saving dataframe as parquet or csv with partitionBy column having 'f' and 'd' with numbers are changing the values.
Below is the example
scala> val df = Seq( | ("9q", 1), | ("3k", 2), | ("6f", 3), | ("7f", 4), | ("7d", 5) | ).toDF("value", "id") df: org.apache.spark.sql.DataFrame = [value: string, id: int] scala> df.show(false) +-----+---+ |value|id | +-----+---+ | 9q | 1 | | 3k | 2 | | 6f | 3 | | 7f | 4 | | 7d | 5 | +-----+---+ scala> df.write.partitionBy("value").mode(SaveMode.Overwrite).parquet("tmp_parquet") scala> spark.read.parquet("tmp_parquet").show(false) +---+-----+ |id |value| +---+-----+ |5 | 7.0 | |3 | 6.0 | |2 | 3k | |4 | 7.0 | |1 | 9q | +---+-----+
Same with the other format too, Is this a bug or is it normal.
Taken from [SO|https://stackoverflow.com/questions/62671684/spark-incorrectly-intepret-partition-name-ending-with-d-or-f-as-number-when]