Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
ghx-label-3
Description
There is a well-known race in Guava's LoadingCache that we are using for CatalogdMetaProvider which we are not currently handling:
- thread 1 gets a cache miss and makes a request to fetch some data from the catalogd. It fetches the catalog object with version 1 and then gets context switched out or otherwise slow
- thread 2 receives an invalidation for the same object, because it has changed to v2. It calls 'invalidate' on the cache, but nothing is yet cached.
- thread 1 puts back v1 of the object into the cache
In essence we've "missed" an invalidation. This is also described in this nice post: https://softwaremill.com/race-condition-cache-guava-caffeine/
The race is quite unlikely but could cause some unexpected results that are hard to reason about, so we should look into a fix.
Attachments
Issue Links
- causes
-
IMPALA-8567 Many random catalog consistency issues with catalog v2/event processor
- Resolved
- is depended upon by
-
IMPALA-8627 Re-enable catalog v2 in containers
- Resolved
- is related to
-
IMPALA-12670 CatalogdMetaProvider.getIfPresent() not throwing the underlying InconsistentMetadataFetchException
- Resolved
-
IMPALA-12699 Coordinator should retry GetPartialCatalogObject request and apply a recv timeout
- Resolved
-
IMPALA-12682 Consider other cache implementation in CatalogdMetaProvider
- Open