Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Won't Fix
-
0.7.0
-
None
Description
Script to reproduce in local spark:
https://gist.github.com/nsivabalan/7250b794788516f1aec35650c2632364
```
scala> spark.sql("select _hoodie_commit_time, _hoodie_record_key, _hoodie_partition_path, id, __op from hudi_trips_snapshot order by _hoodie_record_key").show(false)
----------------------------------++--------------------------
_hoodie_commit_time | _hoodie_record_key | _hoodie_partition_path | id | __op |
----------------------------------++--------------------------
20210210070347 | 1 | 1970-01-01 | 1 | null |
20210210070347 | 2 | 1970-01-01 | 2 | null |
20210210070347 | 3 | 2020-01-04 | 3 | D |
20210210070347 | 4 | 1998-04-13 | 4 | I |
20210210070347 | 5 | 2020-01-01 | 5 | I |
20210210070445 | 6 | 1998-04-13 | 6 | I |
----------------------------------++--------------------------
```
After an upsert, read optimized query returns records from both C1 and C2.
Also, I don't find any log files in partitions. all of them are parquet files.
ls /tmp/hudi_trips_cow/1998-04-13/
0d1e6a84-d036-42e9-806e-a3075b6bc677-0_1-23-12025_20210210065058.parquet
0d1e6a84-d036-42e9-806e-a3075b6bc677-0_1-61-25595_20210210065127.parquet
ls /tmp/hudi_trips_cow/1970-01-01/
7b836833-a656-485d-967a-871bdc653dc3-0_2-61-25596_20210210065127.parquet
7b836833-a656-485d-967a-871bdc653dc3-0_3-23-12027_20210210065058.parquet
Source of the issue: https://github.com/apache/hudi/issues/2255
Attachments
Issue Links
- links to