[SPARK-32147] Spark: PartitionBy changing the columns value - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Not A Problem
Affects Version/s: 3.0.0
Fix Version/s: None
Component/s: Spark Core, Spark Shell
Labels:
- spark

Description

While saving dataframe as parquet or csv with partitionBy column having 'f' and 'd' with numbers are changing the values.

Below is the example

scala> val df = Seq(
 | ("9q", 1),
 | ("3k", 2),
 | ("6f", 3),
 | ("7f", 4),
 | ("7d", 5)
 | ).toDF("value", "id")
df: org.apache.spark.sql.DataFrame = [value: string, id: int]
scala> df.show(false)
+-----+---+
|value|id |
+-----+---+
|  9q | 1 |
|  3k | 2 |
|  6f | 3 |
|  7f | 4 |
|  7d | 5 |
+-----+---+

scala> df.write.partitionBy("value").mode(SaveMode.Overwrite).parquet("tmp_parquet")
scala> spark.read.parquet("tmp_parquet").show(false)
+---+-----+
|id |value|
+---+-----+
|5  | 7.0 |
|3  | 6.0 |
|2  | 3k  |
|4  | 7.0 |
|1  | 9q  |
+---+-----+

Same with the other format too, Is this a bug or is it normal.

Taken from [SO|https://stackoverflow.com/questions/62671684/spark-incorrectly-intepret-partition-name-ending-with-d-or-f-as-number-when]

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Shankar Koirala

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 01/Jul/20 08:26

Updated:: 01/Jul/20 12:39

Resolved:: 01/Jul/20 12:39