Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.9.0
Description
Loading metadata for partitions with custom paths is 4x slower compared to partitions without custom paths, the slow down is due to an N2 lookups to check if a partition already exists.
The List should ideally be replaced with a Set.
From https://github.com/apache/incubator-impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
List<Path> dirsToLoad = Lists.newArrayList(tblLocation); if (!dirsToLoad.contains(partDir) && !FileSystemUtil.isDescendantPath(partDir, tblLocation)) { // This partition has a custom filesystem location. Load its file/block // metadata separately by adding it to the list of dirs to load. dirsToLoad.add(partDir); }
From Java mission control
Stack Trace Sample Count Percentage(%) java.lang.Thread.run() 73,611 97.157 java.util.concurrent.ThreadPoolExecutor$Worker.run() 73,611 97.157 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 73,611 97.157 java.util.concurrent.FutureTask.run() 73,595 97.136 org.apache.impala.catalog.TableLoadingMgr$2.call() 73,555 97.083 org.apache.impala.catalog.TableLoadingMgr$2.call() 73,555 97.083 org.apache.impala.catalog.TableLoader.load(Db, String) 73,555 97.083 org.apache.impala.catalog.HdfsTable.load(boolean, IMetaStoreClient, Table) 73,555 97.083 org.apache.impala.catalog.HdfsTable.load(boolean, IMetaStoreClient, Table, boolean, boolean, Set) 73,555 97.083 org.apache.impala.catalog.HdfsTable.loadAllPartitions(List, Table) 73,508 97.021 java.util.ArrayList.contains(Object) 70,094 92.515 java.util.ArrayList.indexOf(Object) 70,094 92.515 org.apache.hadoop.fs.Path.equals(Object) 69,462 91.681 java.net.URI.equals(Object) 69,462 91.681
Attachments
Issue Links
- is related to
-
IMPALA-4789 Slow metadata loading with many partitions that have inconsistent HDFS path qualification
- Resolved