Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
ghx-label-1
Description
Currently when we create a new Iceberg table in HadoopCatalog we create a new HadoopCatalog instance for each of these tables here
The issue with this is that a catalog object such as HadoopCatalog holds an Iceberg FileIO instance where the size of such an instance can be measured in MBs in terms of memory consumption. This can blow up the catalog/localCatalog memory even if we have empty Iceberg tables in HadoopCatalog.
So as a solution we should have a kind of HadoopCatalog store, where based on a location string we could cache HadoopCatalog objects for later use or cache a new HadoopCatalog in the store. With this approach tables under the sane HadoopCatalog location would be in the same HadoopCatalog instance and we won't end up having as many FileIO instance as many tables we have in HadoopCatalog.