Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
Presently space quota enforcement relies on RegionServers sending reports to the master about each Region that they host. This is done by periodically, reading the cached size of each HFile in each Region (which was ultimately computed from HDFS).
This means that the Master is unaware of Region size growth until the the next time this chore in a RegionServer fires which is a fair amount of latency (a few minutes, by default). Operations like flushes, compactions, and bulk-loads are delayed even though the RegionServer is running those operations locally.
Instead, we can create an API which these operations could invoke that would automatically update the size of the Region being operated on. For example, a successful flush can report that the size of a Region increased by the size of the flush. A compaction can subtract the size of the input files of the compaction and add in the size of the resulting file.
This de-couples the computation of a Region's size from sending the Region sizes to the Master, allowing us to send reports more frequently, increasing the responsiveness of the cluster to size changes.
Attachments
Attachments
Issue Links
- blocks
-
HBASE-18135 Track file archival for low latency space quota with snapshots
- Resolved
- is blocked by
-
HBASE-17752 Update reporting RPCs/Shell commands to break out space utilization by snapshot
- Resolved
-
HBASE-17840 Update book
- Resolved
- relates to
-
HBASE-18134 Re-think if the FileSystemUtilizationChore is still necessary
- Resolved
- links to