Details
-
Bug
-
Status: Reopened
-
Blocker
-
Resolution: Unresolved
-
3.1.1
-
None
-
None
Description
RM hangs,and i cannot submit any new jobs,but RM and NM processes are normal. I can open xxxxx:8088/cluster/apps/RUNNING but can not xxxxx:8088/cluster/scheduler.Those apps submited can not end itself and new apps can not be submited.just everything hangs but not RM,NM server. How can I fix this?help me,please!
here is the log:
ttempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
Attachments
Attachments
Issue Links
- duplicates
-
YARN-8896 Limit the maximum number of container assignments per heartbeat
- Resolved