[HBASE-8784] Wildcard/Range/Partition Delete Support - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: Client, Deletes, regionserver
Labels:
None

Description

We often see use-cases where users, for example with timeseries data, would like to do deletes of large ranges of data, basically like a delete of a partition as supported by RDBMSs. We should support regular expressions or range expressions for the matches (supporting binary keys obviously).

The idea is to store the deletes not with the data, but the meta data. When we read files we read the larger deletes first, and then the inline ones. Of course, this should be reserved for few but very data intensive deletes. This reduces the number of deletes to write to one, instead of many (often thousands, if not millions). This is different from the BulkDeleteEndpoint introduced in ~~HBASE-6942~~. It should support similar Scan based selectiveness.

The new range deletes will mask out all the matching data and handled otherwise like other deletes, for example being dropped during major compactions, once all masked data has been dropped too.

To be discussed is how and where we store the delete entry in practice, since meta data might not be wanted. But it seems like a reasonable choice. The DeleteTracker can handle the delete the same with additional checks for wildcards/ranges. If the deletes are not used, no critical path is affected, therefore not causing any additional latencies or other regressions.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Lars George

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 21/Jun/13 15:57

Updated:: 16/Jun/22 17:14

Resolved:: 16/Jun/22 17:14