[GEODE-7039] Server recovery severely degrades client read traffic (no SingleHop no TX) on redundant partitioned persistent regions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.11.0
Component/s: client/server
Labels:
- needs-review
- pull-request-available

Description

Client not using single hop nor transactions is experiencing severe throttling from the cluster when getting data from a partitioned persistent region while server hosting one of the redundant buckets is recovering (in the process of image recovery). Get operation that have not landed on a server hosting the bucket will be proxied to other members that do have the bucket in a random fashion. This random picking has the nasty consequence that chosen server might be the one recovering now and the bucket is not yet ready (BucketNotFoundException), which means local server will handle ForceReattemptException by sleeping 100ms before another (random) attempt. This sleeping is devasteting for throughput observed by the client.

Attachments

Issue Links

links to

GitHub Pull Request #3955

Activity

People

Assignee:: Mario Ivanac

Reporter:: Mario Ivanac

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 01/Aug/19 05:52

Updated:: 30/Dec/19 18:50

Resolved:: 04/Sep/19 07:25

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1.5h