[SAMZA-1523] Cleanup table entries before shutting down the processor - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

We want to remove expired entries of the processors from the Azure Table when the processor is shutting down. Azure Table service uses optimistic locking by default. Hence, when the coordinator thread is cleaning up during shutdown, it is possible for the heartbeat thread to update that entry as well. This causes a failure in cleanup and throws exceptions in the log. Obviously, it also fails to clear the entries

`
2017-11-30 15:23:32.804 [JMVersionUpgradeScheduler-0] AzureJobCoordinator [INFO] pid=05133d0a-dd85-4178-a97c-2c98dc617308new version 5 of the job model got confirmed
2017-11-30 15:23:32.833 [HeartbeatScheduler-0] HeartbeatScheduler [INFO] Updating heartbeat for processor ID: 05133d0a-dd85-4178-a97c-2c98dc617308 and job model version: 4
2017-11-30 15:23:32.905 [JMVersionUpgradeScheduler-0] TableUtils [ERROR] Azure storage exception while deleting processor entity with job model version: 4and pid: 05133d0a-dd85-4178-a97c-2c98dc617308
com.microsoft.azure.storage.table.TableServiceException: Precondition Failed
at com.microsoft.azure.storage.table.TableServiceException.generateTableServiceException(TableServiceException.java:52)
at com.microsoft.azure.storage.table.TableOperation$1.preProcessResponse(TableOperation.java:319)
at com.microsoft.azure.storage.table.TableOperation$1.preProcessResponse(TableOperation.java:299)
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:139)
at com.microsoft.azure.storage.table.TableOperation.performDelete(TableOperation.java:281)
at com.microsoft.azure.storage.table.TableOperation.execute(TableOperation.java:685)
at com.microsoft.azure.storage.table.CloudTable.execute(CloudTable.java:529)
at com.microsoft.azure.storage.table.CloudTable.execute(CloudTable.java:496)
at org.apache.samza.util.TableUtils.deleteProcessorEntity(TableUtils.java:157)
at org.apache.samza.coordinator.AzureJobCoordinator.onNewJobModelConfirmed(AzureJobCoordinator.java:448)
at org.apache.samza.coordinator.AzureJobCoordinator.onNewJobModelAvailable(AzureJobCoordinator.java:419)
at org.apache.samza.coordinator.AzureJobCoordinator.lambda$createJMVersionUpgradeListener$3(AzureJobCoordinator.java:248)
at org.apache.samza.coordinator.scheduler.JMVersionUpgradeScheduler.lambda$scheduleTask$0(JMVersionUpgradeScheduler.java:81)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2017-11-30 15:23:32.906 [JMVersionUpgradeScheduler-0] AzureJobCoordinator [ERROR] Exception in Job Model Version Upgrade Scheduler. Stopping the processor...
`

We should disable optimistic locking during the cleanup phase of shutdown. Ideal solution is to perhaps have more control over the various schedulers. That is beyond the scope of this JIRA though

Attachments

Issue Links

is related to

SAMZA-1373 Write Coordination Services for Samza Standalone using Azure

In Progress

relates to

SAMZA-1549 Synchronization issue between heartbeat scheduler and job coordinator main thread during processor shutdown

Open

links to

GitHub Pull Request #379

Activity

People

Assignee:: Navina Ramesh

Reporter:: Navina Ramesh

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 05/Dec/17 19:31

Updated:: 17/Jan/18 18:12

Resolved:: 17/Jan/18 18:12