Details
-
Improvement
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
Description
Ozone Manager could be initialized by 'ozone om --init' command and it connects to a running scm.
In case of scm is unavailable because a dns issue the initialization is failed without any retry:
2018-10-31 15:36:26 ERROR OzoneManager:376 - Could not initialize OM version file java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "releastest2-ozone-scm-0.releastest2-ozone-scm":9863; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:768) at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:449) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1552) at org.apache.hadoop.ipc.Client.call(Client.java:1403) at org.apache.hadoop.ipc.Client.call(Client.java:1367) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy9.getScmInfo(Unknown Source) at org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolClientSideTranslatorPB.getScmInfo(ScmBlockLocationProtocolClientSideTranslatorPB.java:154) at org.apache.hadoop.ozone.om.OzoneManager.omInit(OzoneManager.java:358) at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:326) at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:265) Caused by: java.net.UnknownHostException at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:450) ... 10 more
This is a problem for all the containerized environments. In kubernetes om can't be started sometimes. For docker-compose environments we have a 15 sec sleep to be sure to avoid this issue.
Would be great to retry in case of a dns problem.