Details
-
Epic
-
Status: Open
-
Blocker
-
Resolution: Unresolved
-
None
-
0
-
Async Metadata Indexing
Description
For now, we have only FILES partition in metadata table. and our suggestion is to stop all processes and then restart one by one by enabling metadata table. first process to start back will invoke bootstrapping of the metadata table.
But this may not work out well as we add more and more partitions to metadata table.
We need to support bootstrapping a single or more partitions in metadata table while regular writers and table services are in progress.
Penning down my thoughts/idea.
I tried to find a way to get this done w/o adding an additional lock, but could not crack that. So, here is one way to support async bootstrap.
Introducing a file called "available_partitions" in some special file under metadata table. This file will contain the list of partitions that are available to apply updates from data table. i.e. when we do synchronous updates from data table to metadata table, when we have N no of partitions in metadata table, we need to know what partitions are fully bootstrapped and ready to take updates. this file will assist in maintaining that info. We can debate on how to maintain this info (tbl props, or separate file etc, but for now let's say this file is the source of truth). Idea here is that, any async bootstrap process will update this file with the new partition that got bootstrapped once the bootstrap is fully complete. So that all other writers will know what partitions to update.
Add we need to introduce a metadata_lock as well.
here is how writers and async bootstrap will pan out.
Regular writer or any async table service(compaction, etc):
when changes are required to be applied to metadata table: // fyi. as of today this already happens within data table lock.
Take metadata_lock
read contents of available_partitions.
prep records and apply updates to metadata table.
release lock.
Async bootstrap process:
Start bootstrapping of a given partition (eg files) in metadata table.
do it in a loop. i.e. first iteration of bootstrap could take 10 mins for eg. and then again catch up new commits that happened in the last 10 mins which could take 1 min for instance. and then again go for another loop.
Whenever total bootstrap time for a round is ~ 1min or less, in the next round, we can go in for final iteration.
During the final iteration, take the metadata_lock. // this lock should not be held for more than few secs.
apply any new commits that happened while last iteration of bootstrap was happening.
update "available_partitions" file with this partition info that got fully bootstrapped.
release lock.
metadata_lock: will ensure when async bootstrap is in final stages of bootstrapping, we should not miss any commits that is nearing completion. So, we ought to take a lock to ensure we don't miss out on any commits. Either async bootstrap will apply the update, or the actual writer itself will update directly if bootstrap is fully complete.
Rgdn "available_partitions":
I was looking for a way to know what partitions are fully ready to take in direct updates from regular writers and hence chose this way. We can also think about creating a temp_partition(files_temp or something) while bootstrap in progress and then rename to original partition name once bootstrap is fully complete. If we can ensure reliably renaming of these partitions(i.e, once files partition is available, it is fully ready to take in direct updates), we can take this route as well.
Here is how it might pan out w/ folder/partition renaming.
Regular writer or any async table service(compaction, etc):
when changes are required to be applied to metadata table: // fyi. as of today this already happens within data table lock.
Take metadata_lock
list partitions in metadata table. ignore temp partitions.
prep records and apply updates to metadata table.
release lock.
Async bootstrap process:
Start bootstrapping of a given partition (eg files) in metadata table. create a temp folder for partition thats getting bootstrapped. (for eg: files_temp)
do it in a loop. i.e. first iteration of bootstrap could take 10 mins for eg. and then again catch up new commits that happened in the last 10 mins which could take 1 min for instance. and then again go for another loop.
Whenever total bootstrap time for a round is ~ 1min or less, in the next round, we can go in for final iteration.
During the final iteration, take the metadata_lock. // this lock should not be held for more than few secs.
apply any new commits that happened while last iteration of bootstrap was happening.
rename files_temp to files.
release lock.
Note: we just need to ensure that folder renaming is consistent. On crash, either new folder is fully intact or not available. contents of old folder does not matter.
Failures:
a. if bootstrap failed midway, until "files" hasn't been created, we can delete files_temp and start all over again.
b. if bootstrap failed just after rename, again we should be good. Just that lock may not have been released. We need to ensure the metadata lock is released. So, to tackle this, if acquiring metadata_lock from regular writer fails, we will just proceed onto listing partitions and applying updates.
Attachments
Attachments
Issue Links
- is related to
-
HUDI-3175 Support INDEX action for async metadata indexing
- Closed
-
HUDI-3173 Introduce new INDEX action type
- Closed
-
HUDI-3174 Implement metadata filesystem view changes to support INDEX action type
- Closed
-
HUDI-3176 Add index commit metadata
- Closed
-
HUDI-3177 CREATE INDEX command
- Closed
- relates to
-
HUDI-3275 Add tests for async metadata indexing
- Closed
- links to