Details
-
Improvement
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
3.4.1
-
None
Description
Problem statement
I observed Yarn cluster has pending and available resources as well, but the cluster utilization is usually around ~50%. The cluster had loaded with 200 parallel PI example job (from hadoop-mapreduce-examples) with 20 map and 20 reduce containers configured, on a 50 nodes cluster, where each node had 8 cores, and a lot of memory (there was cpu bottleneck).
Finally, I realized the RM had some IO bottleneck and needed 1~20 seconds to persist a RMStateStoreEvent (using FileSystemRMStateStore).
To reduce the impact of the issue:
- create a dispatcher where events can persist in parallel threads
- create metric data for the RMStateStore event queue to be able easily to identify the problem if occurs on a cluster
Also another way to identify the issue if we can see too much time is required to store info for app after reach new_saving state
Solution
Created a MultiDispatcher class which implements the Dispatcher interface.
The Dispatcher creates a separate metric object called Event metrics for "rm-state-store" where we can see
- how many unhandled events are currently present in the event queue for the specific event type
- how many events were handled for the specific event type
- average execution time for the specific event
The dispatcher has the following configs ( the placeholder is for the dispatcher name, for example, rm-state-store )
Config name | Description | Default value |
---|---|---|
yarn.dispatcher.multi-thread.{}.default-pool-size | How many parallel threads should execute the parallel event execution | 4 |
yarn.dispatcher.multi-thread.{}.queue-size | Size of the eventqueue | 1 000 000 |
yarn.dispatcher.multi-thread.{}.monitor-seconds | The size of the event queue will be logged with this frequency (if not zero) | 0 |
yarn.dispatcher.multi-thread.{}.graceful-stop-seconds | After the stop signal the dispatcher will wait this many seconds to be able to process the incoming events before terminating them | 60 |
yarn.dispatcher.multi-thread.{}.metrics-enabled | The dispatcher should publish metrics data to the metric system | false |
... { "name": "Hadoop:service=ResourceManager,name=Event metrics for rm-state-store", "modelerType": "Event metrics for rm-state-store", "tag.Context": "yarn", "tag.Hostname": CENSORED "RMStateStoreEventType#STORE_APP_ATTEMPT_Current": 51, "RMStateStoreEventType#STORE_APP_ATTEMPT_NumOps": 0, "RMStateStoreEventType#STORE_APP_ATTEMPT_AvgTime": 0.0, "RMStateStoreEventType#STORE_APP_Current": 124, "RMStateStoreEventType#STORE_APP_NumOps": 46, "RMStateStoreEventType#STORE_APP_AvgTime": 3318.25, "RMStateStoreEventType#UPDATE_APP_Current": 31, "RMStateStoreEventType#UPDATE_APP_NumOps": 16, "RMStateStoreEventType#UPDATE_APP_AvgTime": 2629.6666666666665, "RMStateStoreEventType#UPDATE_APP_ATTEMPT_Current": 31, "RMStateStoreEventType#UPDATE_APP_ATTEMPT_NumOps": 12, "RMStateStoreEventType#UPDATE_APP_ATTEMPT_AvgTime": 2048.6666666666665, "RMStateStoreEventType#REMOVE_APP_Current": 12, "RMStateStoreEventType#REMOVE_APP_NumOps": 3, "RMStateStoreEventType#REMOVE_APP_AvgTime": 1378.0, "RMStateStoreEventType#REMOVE_APP_ATTEMPT_Current": 0, "RMStateStoreEventType#REMOVE_APP_ATTEMPT_NumOps": 0, "RMStateStoreEventType#REMOVE_APP_ATTEMPT_AvgTime": 0.0, "RMStateStoreEventType#FENCED_Current": 0, "RMStateStoreEventType#FENCED_NumOps": 0, "RMStateStoreEventType#FENCED_AvgTime": 0.0, "RMStateStoreEventType#STORE_MASTERKEY_Current": 0, "RMStateStoreEventType#STORE_MASTERKEY_NumOps": 0, "RMStateStoreEventType#STORE_MASTERKEY_AvgTime": 0.0, "RMStateStoreEventType#REMOVE_MASTERKEY_Current": 0, "RMStateStoreEventType#REMOVE_MASTERKEY_NumOps": 0, "RMStateStoreEventType#REMOVE_MASTERKEY_AvgTime": 0.0, "RMStateStoreEventType#STORE_DELEGATION_TOKEN_Current": 0, "RMStateStoreEventType#STORE_DELEGATION_TOKEN_NumOps": 0, "RMStateStoreEventType#STORE_DELEGATION_TOKEN_AvgTime": 0.0, "RMStateStoreEventType#REMOVE_DELEGATION_TOKEN_Current": 0, "RMStateStoreEventType#REMOVE_DELEGATION_TOKEN_NumOps": 0, "RMStateStoreEventType#REMOVE_DELEGATION_TOKEN_AvgTime": 0.0, "RMStateStoreEventType#UPDATE_DELEGATION_TOKEN_Current": 0, "RMStateStoreEventType#UPDATE_DELEGATION_TOKEN_NumOps": 0, "RMStateStoreEventType#UPDATE_DELEGATION_TOKEN_AvgTime": 0.0, "RMStateStoreEventType#UPDATE_AMRM_TOKEN_Current": 0, "RMStateStoreEventType#UPDATE_AMRM_TOKEN_NumOps": 0, "RMStateStoreEventType#UPDATE_AMRM_TOKEN_AvgTime": 0.0, "RMStateStoreEventType#STORE_RESERVATION_Current": 0, "RMStateStoreEventType#STORE_RESERVATION_NumOps": 0, "RMStateStoreEventType#STORE_RESERVATION_AvgTime": 0.0, "RMStateStoreEventType#REMOVE_RESERVATION_Current": 0, "RMStateStoreEventType#REMOVE_RESERVATION_NumOps": 0, "RMStateStoreEventType#REMOVE_RESERVATION_AvgTime": 0.0, "RMStateStoreEventType#STORE_PROXY_CA_CERT_Current": 0, "RMStateStoreEventType#STORE_PROXY_CA_CERT_NumOps": 0, "RMStateStoreEventType#STORE_PROXY_CA_CERT_AvgTime": 0.0 }, ...
Testing
I deployed the MultiDispatcher supported version of yarn to the cluster and applied the following performance test:
#!/bin/bash for i in {1..50}; do ssh root@$i-node-url 'nohup ./perf.sh 4 1>/dev/null 2>/dev/nul &' & done sleep 300 for i in {1..50}; do ssh root@$i-node-url "pkill -9 -f perf" & done sleep 5 echo "DONE"
Each node had do following perf script
#!/bin/bash while true do if [ $(ps -o pid= -u hadoop | wc -l) -le $1 ] then hadoop jar /opt/hadoop-mapreduce-examples.jar pi 20 20 1>/dev/null 2>&1 & fi sleep 1 done
This way in 5 minute (+ wait until all job finish) i could process 332 app.
After i tested the same with the official build i needed 5 minute only to finish with the first app, after that 221 app were finished.
I also tested it with LeveldbRMStateStore and ZKRMStateStore and did not found any problem with the implementation
Attachments
Attachments
Issue Links
- links to