Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
Impala 3.1.0
-
None
-
ghx-label-7
Description
Currently, we have a neat breakdown of per-query LocalCatalog cache metrics in the query runtime profile (in LocalCatalog mode). For ex:
- CatalogFetch.ColumnStats.Misses: 13 - CatalogFetch.ColumnStats.Requests: 13 - CatalogFetch.ColumnStats.Time: 17ms - CatalogFetch.Config.Misses: 1 - CatalogFetch.Config.Requests: 1 - CatalogFetch.Config.Time: 4ms - CatalogFetch.DatabaseList.Hits: 1 - CatalogFetch.DatabaseList.Requests: 1 - CatalogFetch.DatabaseList.Time: 0 - CatalogFetch.PartitionLists.Misses: 1 - CatalogFetch.PartitionLists.Requests: 1 - CatalogFetch.PartitionLists.Time: 5ms - CatalogFetch.Partitions.Hits: 48 - CatalogFetch.Partitions.Misses: 24 - CatalogFetch.Partitions.Requests: 72 - CatalogFetch.Partitions.Time: 26ms - CatalogFetch.RPCs.Bytes: 33.96 KB (34775) - CatalogFetch.RPCs.Requests: 4 - CatalogFetch.RPCs.Time: 358ms - CatalogFetch.TableNames.Hits: 2 - CatalogFetch.TableNames.Requests: 2 - CatalogFetch.TableNames.Time: 0 - CatalogFetch.Tables.Misses: 1 - CatalogFetch.Tables.Requests: 1 - CatalogFetch.Tables.Time: 359ms
The idea here is to aggregate these across all the queries and present them on the coordinator web UI so that we can answer questions like following,
- What type of requests constitute the majority of cache hits/misses.
- What type of requests spend most of the time spent in RPCs / fetch most RPC data
.......