Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5429

Use a thread pool to load block metadata in parallel

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.9.0
    • Impala 2.11.0
    • Catalog

    Description

      Metadata loading for tables with lots of partitions can be fairly slow special on S3 and ADLS, the operation is fairly latency driven so multiple threads should help speedup the process.

      Listing files from multiple partitions in parallel should provide well speedup specially for S3 and ADLS where latencies are usually higher than HDFS.

      HdfsTable.loadPartitionFileMetadata(StorageDescriptor,-HdfsPartition) might be a good starting point.

      Stack-Trace Count Percentage(%) Total
      com.amazonaws.services.s3.AmazonS3Client.listObjects(ListObjectsRequest) 4,340 75.649 83,489,694,712
      ---org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(ListObjectsRequest) 4,340 75.649 83,489,694,712
      ------org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(Path,-String,-Set) 3,256 56.754 63,540,096,016
      ---------org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(Path,-boolean) 3,256 56.754 63,540,096,016
      ------------org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(Path) 3,256 56.754 63,540,096,016
      ---------------org.apache.hadoop.fs.FileSystem.exists(Path) 2,178 37.964 45,375,122,798
      ------------------org.apache.hadoop.fs.s3a.S3AFileSystem.exists(Path) 2,178 37.964 45,375,122,798
      ---------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(StorageDescriptor,-HdfsPartition) 1,082 18.86 23,383,160,065
      ------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(List) 1,082 18.86 23,383,160,065
      ---------------------------org.apache.impala.catalog.HdfsTable.updatePartitionsFromHms(IMetaStoreClient,-Set,-boolean) 1,082 18.86 23,383,160,065
      ------------------------------org.apache.impala.catalog.HdfsTable.load(boolean,-IMetaStoreClient,-Table,-boolean,-boolean,-Set) 1,082 18.86 23,383,160,065
      ---------------------------------org.apache.impala.catalog.HdfsTable.load(boolean,-IMetaStoreClient,-Table) 1,082 18.86 23,383,160,065
      ---------------------org.apache.impala.catalog.HdfsTable.refreshFileMetadata(HdfsPartition) 1,096 19.104 21,991,962,733
      ------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(StorageDescriptor,-HdfsPartition) 1,096 19.104 21,991,962,733
      ---------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(List) 1,096 19.104 21,991,962,733
      ------------------------------org.apache.impala.catalog.HdfsTable.updatePartitionsFromHms(IMetaStoreClient,-Set,-boolean) 1,096 19.104 21,991,962,733
      --------------org.apache.hadoop.fs.s3a.S3AFileSystem.innerListFiles(Path,-boolean,-Listing$FileStatusAcceptor) 1,078 18.79 18,164,973,218
      ------------------org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(Path,-boolean) 1,078 18.79 18,164,973,218
      ---------------------org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(FileSystem,-Path,-HashMap) 1,078 18.79 18,164,973,218
      ------------------------org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(FileSystem,-HdfsPartition) 1,078 18.79 18,164,973,218
      ---------------------------org.apache.impala.catalog.HdfsTable.refreshFileMetadata(HdfsPartition) 1,078 18.79 18,164,973,218
      ------------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(StorageDescriptor,-HdfsPartition) 1,078 18.79 18,164,973,218
      ---------------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(List) 1,078 18.79 18,164,973,218
      ------org.apache.hadoop.fs.s3a.Listing$ObjectListingIterator.<init>(Listing,-Path,-ListObjectsRequest) 1,084 18.895 19,949,598,696
      ---------org.apache.hadoop.fs.s3a.Listing.createFileStatusListingIterator(Path,-ListObjectsRequest,-PathFilter,-Listing$FileStatusAcceptor,-RemoteIterator) 1,084 18.895 19,949,598,696
      ------------org.apache.hadoop.fs.s3a.S3AFileSystem.innerListFiles(Path,-boolean,-Listing$FileStatusAcceptor) 1,084 18.895 19,949,598,696
      ---------------org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(Path,-boolean) 1,084 18.895 19,949,598,696
      ------------------org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(FileSystem,-Path,-HashMap) 1,084 18.895 19,949,598,696
      --------------------org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(FileSystem,-HdfsPartition) 1,084 18.895 19,949,598,696

      Attachments

        Issue Links

          Activity

            People

              bharathv Bharath Vissapragada
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: