[HDDS-4388] Make writeStateMachineTimeout retry count proportional to node failure timeout - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.1.0
Component/s: Ozone Datanode
Labels:
- pull-request-available

Target Version/s:

1.1.0

Description

Currently, in ratis "writeStateMachinecall" gets retried indefinitely in event of a timeout. In case, where disks are slow/overloaded or number of chunk writer threads are not available for a period of 10s, writeStateMachine call times out in 10s. In cases like these, the same write chunk keeps on getting retried causing the same chunk of data to be overwritten. The idea here is to abort the request once the node failure timeout reaches.

Attachments

Issue Links

causes

HDDS-10717 nodeFailureTimeoutMs should be initialized before syncTimeoutRetry

Resolved

relates to

HDDS-9821 XceiverServerRatis SyncTimeoutRetry is overridden

Resolved

links to

GitHub Pull Request #1519

Activity

People

Assignee:: Shashikant Banerjee

Reporter:: Shashikant Banerjee

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 23/Oct/20 07:31

Updated:: 22/Apr/24 03:28

Resolved:: 27/Oct/20 07:13