[HDFS-14997] BPServiceActor processes commands from NameNode asynchronously - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.3.0, 3.2.3, 3.2.4
Component/s: datanode
Labels:
None

Hadoop Flags:

Reviewed

Description

There are two core functions, report(#sendHeartbeat, #blockReport, #cacheReport) and #processCommand in #BPServiceActor main process flow. If processCommand cost long time it will block send report flow. Meanwhile processCommand could cost long time(over 1000s the worst case I meet) when IO load of DataNode is very high. Since some IO operations are under #datasetLock, So it has to wait to acquire #datasetLock long time when process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat will not send to NameNode in-time, and trigger other disasters.
I propose to improve #processCommand asynchronously and not block #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
Notes:
1. Lifeline could be one effective solution, however some old branches are not support this feature.
2. IO operations under #datasetLock is another issue, I think we should solve it at another JIRA.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-14997-branch-3.2.001.patch
15/Sep/21 03:48
11 kB
Xiaoqiao He
HDFS-14997.addendum.patch
26/Dec/19 09:35
0.7 kB
Xiaoqiao He
image-2019-12-26-16-15-44-814.png
26/Dec/19 08:15
62 kB
Zhenyu Zheng
HDFS-14997.005.patch
27/Nov/19 08:17
11 kB
Xiaoqiao He
HDFS-14997.004.patch
26/Nov/19 17:04
11 kB
Xiaoqiao He
HDFS-14997.003.patch
22/Nov/19 11:12
9 kB
Xiaoqiao He
HDFS-14997.002.patch
21/Nov/19 15:42
7 kB
Xiaoqiao He
HDFS-14997.001.patch
20/Nov/19 05:04
4 kB
Xiaoqiao He

Issue Links

breaks

HBASE-26970 TestMetaFixed fails reliably with Hadoop 3.2.3 and Hadoop 3.3.2

Resolved

is related to

HDFS-15651 Client could not obtain block when DN CommandProcessingThread exit

Resolved

relates to

HDFS-15113 Missing IBR when NameNode restart if open processCommand async feature

Resolved

HDFS-15651 Client could not obtain block when DN CommandProcessingThread exit

Resolved

HDFS-16586 Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit'

Resolved

HDFS-15075 Remove process command timing from BPServiceActor

Resolved

(1 relates to)

Activity

People

Assignee:: Xiaoqiao He

Reporter:: Xiaoqiao He

Votes:: 0 Vote for this issue

Watchers:: 24 Start watching this issue

Dates

Created:: 20/Nov/19 04:14

Updated:: 20/May/22 23:13

Resolved:: 17/Sep/21 14:20