Details
Description
We are trying to take snapshot from code and read data using MR and spark, both approaches are returning duplicate records.
On the API side, {{org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat }} is used.
Snapshot was taken during the table was in a region split state.
We suspect it is due to data is being returned for both parent and daughter regions.
Attachments
Issue Links
- duplicates
-
HBASE-16011 TableSnapshotScanner and TableSnapshotInputFormat can produce duplicate rows
- Resolved