Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-776

Make OM initialization resilient to dns failures

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 0.4.0
    • Ozone Manager
    • None

    Description

      Ozone Manager could be initialized by 'ozone om --init' command and it connects to a running scm.

      In case of scm is unavailable because a dns issue the initialization is failed without any retry:

       2018-10-31 15:36:26 ERROR OzoneManager:376 - Could not initialize OM version file
      java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "releastest2-ozone-scm-0.releastest2-ozone-scm":9863; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
      	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
      	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:768)
      	at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:449)
      	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1552)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1403)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1367)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
      	at com.sun.proxy.$Proxy9.getScmInfo(Unknown Source)
      	at org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolClientSideTranslatorPB.getScmInfo(ScmBlockLocationProtocolClientSideTranslatorPB.java:154)
      	at org.apache.hadoop.ozone.om.OzoneManager.omInit(OzoneManager.java:358)
      	at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:326)
      	at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:265)
      Caused by: java.net.UnknownHostException
      	at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:450)
      	... 10 more 
      

      This is a problem for all the containerized environments. In kubernetes om can't be started sometimes. For docker-compose environments we have a 15 sec sleep to be sure to avoid this issue.

      Would be great to retry in case of a dns problem.

      Attachments

        1. HDDS-776.001.patch
          10 kB
          Attila Doroszlai

        Activity

          People

            adoroszlai Attila Doroszlai
            elek Marton Elek
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: