Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.5.1
-
None
-
None
-
None
Description
It is not necessary to check whether dag is commit in RecoveryTransition, because we already check that in RecoveryParser by using the summary event.
Copy the comments from TEZ-1737,
But even the non-summary VertexFinishedEvent is seen, its VertexRecoverableEventsGeneratedEvent may still lost. I think there's no guaranteed that VertexRecoverableEventsGeneratedEvent is logged before VertexFinishedEvent.
The expectation was that all tasks are completed before a vertex has finished. Also, a TaskFinishedEvent is only seen after all its datamovement events are generated and therefore logged.
The handling for for the general case where there are a lot of data movement events generated, commit started and then ended. In a scenario, where commit starts but does not end, the summary log helps catch the problem. Now, in a scenario, where commit finished successfully, there could be a situation where the AM crashed before all data movements are stored to recovery. In this scenario, we cannot do anything as the commit has already been done but we have no idea what was lost. The main crux to answer your question is that a committer cannot be invoked twice.
Agree that VertexRecoverableEventsGeneratedEvent is a different problem. In such cases, I believe that if VertexRecoverableEventsGeneratedEvent is not seen before a VertexFinished is seen, there needs to be some additional handling for that scenario too. If a VertexRecoverableEventsGeneratedEvent is always guaranteed to be generated for a vertex and it is not seen, then that means it is a potential non-recoverable case when the vertex itself was seen to have been completed.