Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-6102

Integrating Ozone with Hive produce a thread leak in HS2 server

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.1.0
    • None
    • OFS
    • None

    Description

      Integration ozone with hive is producing a thread leak in HS2, in this sample, 12 open connections to hive produced 149 threads and the count kept increasing until HS2 needed to be restarted.

      SETTINGS:

      HDFS integration using the following settings

      • viewfs-mount-table:
          fs.viewfs.mounttable.clusters.link./cluster1=hdfs://cluster1
          fs.viewfs.mounttable.clusters.link./ozfs1=ofs://ozfs1
      • core-site.xml:
          fs.ofs.impl=org.apache.hadoop.fs.ozone.RootedOzoneFileSystem
          fs.AbstractFileSystem.o3fs.impl=org.apache.hadoop.fs.ozone.OzFs
         
      • hdfs-site.xml:
          ozone.om.service.ids=ozfs1
          ozone.om.nodes.ozfs1=om1,om2
          ozone.om.address.ozfs1.om1=ozone1.domain.com:9862
          ozone.om.address.ozfs1.om2=ozone2.domain.com:9862
          dfs.nameservices=cluster1,ozfs1
          ozone.om.kerberos.keytab.file=/etc/security/keytabs/om.service.keytab
          ozone.om.kerberos.principal=om/_HOST@DOMAIN.COM

       

      Hive integration using the following setting

      • hive-site.xml:
          tez.job.fs-servers=hdfs://cluster1,ofs://ozfs1
          mapreduce.job.hdfs-servers=hdfs://cluster1,ofs://ozfs1

       

      From hive's stack trace we see many thread like these:

      Thread 4958 (Truststore reloader thread):
        State: TIMED_WAITING
        Blocked count: 0
        Waited count: 221
        Stack:
          java.lang.Thread.$$YJP$$sleep(Native Method)
          java.lang.Thread.sleep(Thread.java)
          org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run(ReloadingX509TrustManager.java:195)
          java.lang.Thread.run(Thread.java:748)
      Thread 4948 (Truststore reloader thread):
        State: TIMED_WAITING
        Blocked count: 79
        Waited count: 221
        Stack:
          java.lang.Thread.$$YJP$$sleep(Native Method)
          java.lang.Thread.sleep(Thread.java)
          org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run(ReloadingX509TrustManager.java:195)
          java.lang.Thread.run(Thread.java:748)
      Thread 4777 (Truststore reloader thread):
        State: TIMED_WAITING
        Blocked count: 0
        Waited count: 252
        Stack:
          java.lang.Thread.$$YJP$$sleep(Native Method)
          java.lang.Thread.sleep(Thread.java)
          org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run(ReloadingX509TrustManager.java:195)
          java.lang.Thread.run(Thread.java:748)

       

      Using yourKit we identified the following:

      {{java.lang.Thread.<init>(Runnable, String) Thread.java
      org.apache.hadoop.security.ssl.ReloadingX509TrustManager.init() ReloadingX509TrustManager.java:95
      org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(SSLFactory$Mode) FileBasedKeyStoresFactory.java:223
      org.apache.hadoop.security.ssl.SSLFactory.init() SSLFactory.java:180
      org.apache.hadoop.yarn.client.api.impl.TimelineConnector.getSSLFactory(Configuration) TimelineConnector.java:181
      org.apache.hadoop.yarn.client.api.impl.TimelineConnector.serviceInit(Configuration) TimelineConnector.java:108
      org.apache.hadoop.service.AbstractService.init(Configuration) AbstractService.java:164
      org.apache.hadoop.service.CompositeService.serviceInit(Configuration) CompositeService.java:108
      org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceInit(Configuration) TimelineClientImpl.java:130
      org.apache.hadoop.service.AbstractService.init(Configuration) AbstractService.java:164
      org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken() YarnClientImpl.java:405
      org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(ContainerLaunchContext) YarnClientImpl.java:381
      org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(ApplicationSubmissionContext) YarnClientImpl.java:300
      org.apache.tez.client.TezYarnClient.submitApplication(ApplicationSubmissionContext) TezYarnClient.java:77
      org.apache.tez.client.TezClient.start() TezClient.java:402
      org.apache.hadoop.hive.ql.exec.tez.TezSessionState.startSessionAndContainers(TezClient, HiveConf, Map, TezConfiguration, boolean) TezSessionState.java:516
      org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(String[], boolean, SessionState$LogHelper, TezSessionState$HiveResources) TezSessionState.java:451
      org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.openInternal(String[], boolean, SessionState$LogHelper, TezSessionState$HiveResources) TezSessionPoolSession.java:124
      org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(48String[]) TezSessionState.java:373
      org.apache.hadoop.hive.ql.exec.tez.TezTask.ensureSessionHasResources(TezSessionState, String[]) TezTask.java:373
      org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(DriverContext) TezTask.java:200
      org.apache.hadoop.hive.ql.exec.Task.executeTask(HiveHistory) Task.java:212
      org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential() TaskRunner.java:103
      org.apache.hadoop.hive.ql.Driver.launchTask(Task, String, boolean, String, int, DriverContext) Driver.java:2712
      org.apache.hadoop.hive.ql.Driver.execute() Driver.java:2383
      org.apache.hadoop.hive.ql.Driver.runInternal(String, boolean) Driver.java:2055
      org.apache.hadoop.hive.ql.Driver.run(String, boolean) Driver.java:1753
      org.apache.hadoop.hive.ql.Driver.run() Driver.java:1747
      org.apache.hadoop.hive.ql.reexec.ReExecDriver.run() ReExecDriver.java:157
      org.apache.hive.service.cli.operation.SQLOperation.runQuery() SQLOperation.java:226
      org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation) SQLOperation.java:87
      org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run() SQLOperation.java:324
      javax.security.auth.Subject.doAs(Subject, PrivilegedExceptionAction) Subject.java
      org.apache.hadoop.security.UserGroupInformation.doAs(PrivilegedExceptionAction) UserGroupInformation.java:1729
      org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run() SQLOperation.java:342
      java.util.concurrent.Executors$RunnableAdapter.call() Executors.java:511
      java.util.concurrent.FutureTask.run() FutureTask.java:266
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1149
      java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:624
      java.lang.Thread.run() Thread.java:748


      java.lang.Thread.<init>(Runnable, String) Thread.java
      org.apache.hadoop.security.ssl.ReloadingX509TrustManager.init() ReloadingX509TrustManager.java:95
      org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(SSLFactory$Mode) FileBasedKeyStoresFactory.java:223
      org.apache.hadoop.security.ssl.SSLFactory.init() SSLFactory.java:180
      org.apache.hadoop.crypto.key.kms.KMSClientProvider.<init>(URI, Configuration) KMSClientProvider.java:390
      org.apache.hadoop.crypto.key.kms.KMSClientProvider$Factory.createProviders(Configuration, URL, int, String) KMSClientProvider.java:318
      org.apache.hadoop.crypto.key.kms.KMSClientProvider$Factory.createProvider(URI, Configuration) KMSClientProvider.java:303
      org.apache.hadoop.crypto.key.KeyProviderFactory.get(URI, Configuration) KeyProviderFactory.java:96
      org.apache.hadoop.util.KMSUtil.createKeyProviderFromUri(Configuration, URI) KMSUtil.java:83
      org.apache.hadoop.ozone.client.rpc.OzoneKMSUtil.getKeyProvider(ConfigurationSource, URI) OzoneKMSUtil.java:138
      org.apache.hadoop.ozone.client.rpc.RpcClient.getKeyProvider() RpcClient.java:1310
      org.apache.hadoop.ozone.client.ObjectStore.getKeyProvider() ObjectStore.java:222
      org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.getKeyProvider() BasicRootedOzoneClientAdapterImpl.java:785
      org.apache.hadoop.fs.ozone.RootedOzoneFileSystem.getKeyProvider() RootedOzoneFileSystem.java:54
      org.apache.hadoop.fs.ozone.RootedOzoneFileSystem.getAdditionalTokenIssuers() RootedOzoneFileSystem.java:67
      org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer, String, Credentials, List) DelegationTokenIssuer.java:104
      org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(String, Credentials) DelegationTokenIssuer.java:76
      org.apache.tez.common.security.TokenCache.obtainTokensForFileSystemsInternal(FileSystem, Credentials, Configuration) TokenCache.java:140
      org.apache.tez.common.security.TokenCache.obtainTokensForFileSystemsInternal(Credentials, Path[], Configuration) TokenCache.java:101
      org.apache.tez.common.security.TokenCache.obtainTokensForFileSystems(Credentials, Path[], Configuration) TokenCache.java:77
      org.apache.tez.client.TezClientUtils.populateTokenCache(TezConfiguration, Credentials) TezClientUtils.java:746
      org.apache.tez.client.TezClientUtils.prepareAmLaunchCredentials(AMConfiguration, Credentials, TezConfiguration, Path) TezClientUtils.java:722
      org.apache.tez.client.TezClientUtils.createApplicationSubmissionContext(ApplicationId, DAG, String, AMConfiguration, Map, Credentials, boolean, TezApiVersionInfo, ServicePluginsDescriptor, JavaOptsChecker) TezClientUtils.java:487
      org.apache.tez.client.TezClient.setupApplicationContext() TezClient.java:501
      org.apache.tez.client.TezClient.start() TezClient.java:401
      org.apache.hadoop.hive.ql.exec.tez.TezSessionState.startSessionAndContainers(TezClient, HiveConf, Map, TezConfiguration, boolean) TezSessionState.java:516
      org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(String[], boolean, SessionState$LogHelper, TezSessionState$HiveResources) TezSessionState.java:451
      org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.openInternal(String[], boolean, SessionState$LogHelper, TezSessionState$HiveResources) TezSessionPoolSession.java:124
      org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(String[]) TezSessionState.java:373
      org.apache.hadoop.hive.ql.exec.tez.TezTask.ensureSessionHasResources(TezSessionState, String[]) TezTask.java:373
      org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(DriverContext) TezTask.java:200
      org.apache.hadoop.hive.ql.exec.Task.executeTask(HiveHistory) Task.java:212
      org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential() TaskRunner.java:103
      org.apache.hadoop.hive.ql.Driver.launchTask(Task, String, boolean, String, int, DriverContext) Driver.java:2712
      org.apache.hadoop.hive.ql.Driver.execute() Driver.java:2383
      org.apache.hadoop.hive.ql.Driver.runInternal(String, boolean) Driver.java:2055
      org.apache.hadoop.hive.ql.Driver.run(String, boolean) Driver.java:1753
      org.apache.hadoop.hive.ql.Driver.run() Driver.java:1747
      org.apache.hadoop.hive.ql.reexec.ReExecDriver.run() ReExecDriver.java:157
      org.apache.hive.service.cli.operation.SQLOperation.runQuery() SQLOperation.java:226
      org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation) SQLOperation.java:87
      org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run() SQLOperation.java:324
      javax.security.auth.Subject.doAs(Subject, PrivilegedExceptionAction) Subject.java
      org.apache.hadoop.security.UserGroupInformation.doAs(PrivilegedExceptionAction) UserGroupInformation.java:1729
      org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run() SQLOperation.java:342
      java.util.concurrent.Executors$RunnableAdapter.call() Executors.java:511
      java.util.concurrent.FutureTask.run() FutureTask.java:266
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1149
      java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:624
      java.lang.Thread.run() Thread.java:748}}

       

      This looks similar to HDFS-14037

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              djcelis Diego Jaramillo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: