Description
On large clusters while enabling kerberos or on running kerberos service check, NPE is thrown on for CHECK_KEYTABS, REMOVE_KEYTAB, SET_KEYTAB
2023-03-06 07:22:00,538 INFO [agent-command-publisher-0] AgentCommandsPublisher:174 - CHECK_KEYTABS called 2023-03-06 07:22:00,538 ERROR [ambari-action-scheduler] AgentCommandsPublisher:126 - Exception on sendAgentCommand java.util.concurrent.ExecutionException: java.lang.NullPointerException at java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1006) at org.apache.ambari.server.events.publishers.AgentCommandsPublisher.sendAgentCommand(AgentCommandsPublisher.java:124) at org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:555) at org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:347) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598) at java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1005) ... 4 more Caused by: java.lang.NullPointerException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598) at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677) at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735) at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:650) at org.apache.ambari.server.events.publishers.AgentCommandsPublisher.lambda$sendAgentCommand$1(AgentCommandsPublisher.java:103) at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:163) Caused by: java.lang.NullPointerException at org.apache.ambari.server.events.publishers.AgentCommandsPublisher.prepareExecutionCommandsClusters(AgentCommandsPublisher.java:214) at org.apache.ambari.server.events.publishers.AgentCommandsPublisher.populateExecutionCommandsClusters(AgentCommandsPublisher.java:192) at org.apache.ambari.server.events.publishers.AgentCommandsPublisher.lambda$null$0(AgentCommandsPublisher.java:122) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at com.google.common.collect.CollectSpliterators$1.lambda$forEachRemaining$1(CollectSpliterators.java:116) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) at com.google.common.collect.CollectSpliterators$1.forEachRemaining(CollectSpliterators.java:116) at com.google.common.collect.CollectSpliterators$1FlatMapSpliterator.lambda$forEachRemaining$1(CollectSpliterators.java:247) at java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1699) at com.google.common.collect.CollectSpliterators$1FlatMapSpliterator.forEachRemaining(CollectSpliterators.java:247) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290) at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) ... 4 more
This might be due to the using the Treemap for executionCommandsClusters multithreading operations, so we need to update to a threadsafe datastructure for executionCommandsClusters.
Due to this, kerberos service check gets stuck for 30 minutes and then the commands are sent to agent again, then the service check gets successful.
Also, on large clusters this is happening multiple times on during enabling kerberos.
Attachments
Issue Links
- Blocked
-
AMBARI-26001 ambari2.8 release
- Open
- links to