[HBASE-28672] Ensure large batches are not indefinitely blocked by quotas - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.6.0
Fix Version/s: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.6.1
Component/s: Quotas
Labels:
- pull-request-available

Description

At my day job we are trying to implement default quotas for a variety of access patterns. We began by introducing a default read IO limit per-user, per-machine — this has been very successful in reducing hotspots, even on clusters with thousands of distinct users.

While implementing a default writes/second throttle, I realized that doing so would put us in a precarious situation where large-enough batches may never succeed. If your batch size is greater than your TimeLimiter's max throughput, then you will always fail in the quota estimation stage. Meanwhile IO estimates are more optimistic, deliberately, which can let large requests do targeted oversubscription of an IO quota:

// assume 1 block required for reads. this is probably a low estimate, which is okay
readConsumed = numReads > 0 ? blockSizeBytes : 0;

This is okay because the Limiter's availability will go negative and force a longer backoff on subsequent requests. I believe this is preferable UX compared to a doomed throttling loop.

In my opinion, we should do something similar in batch request estimation, by estimating a batch request's workload at Math.min(batchSize, limiterMaxThroughput) rather than simply batchSize.