Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-1823

`createJob` API uses single thread to move all tasks to PENDING

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • 0.17.0
    • None
    • None

    Description

      If you create a single job with many tasks (lets say 10k+) the `createJob` API will take a long time. This is because the `createJob` API only returns when all of the tasks have moved to PENDING and it uses a single thread to do so. Here is a snippet of the logs:

      ...
      I1116 17:11:53.964 [qtp1219612889-50, StateMachine$Builder:389] sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57114-8aff8e77-3bde-4a83-99eb-8c6e52f14a7a state machine transition INIT -> PENDING
      I1116 17:11:53.965 [qtp1219612889-50, TaskStateMachine:474] Adding work command SAVE_STATE for sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57114-8aff8e77-3bde-4a83-99eb-8c6e52f14a7a
      I1116 17:11:54.094 [qtp1219612889-50, StateMachine$Builder:389] sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57115-f5baa93f-78af-470d-bcdf-1d86c0b98c80 state machine transition INIT -> PENDING
      I1116 17:11:54.094 [qtp1219612889-50, TaskStateMachine:474] Adding work command SAVE_STATE for sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57115-f5baa93f-78af-470d-bcdf-1d86c0b98c80
      I1116 17:11:54.223 [qtp1219612889-50, StateMachine$Builder:389] sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57116-0553d98c-f5de-4857-9a70-c5c748ddee03 state machine transition INIT -> PENDING
      I1116 17:11:54.224 [qtp1219612889-50, TaskStateMachine:474] Adding work command SAVE_STATE for sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57116-0553d98c-f5de-4857-9a70-c5c748ddee03
      I1116 17:11:54.353 [qtp1219612889-50, StateMachine$Builder:389] sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57117-46e168f6-8753-4be0-873d-f18d1f562570 state machine transition INIT -> PENDING
      I1116 17:11:54.353 [qtp1219612889-50, TaskStateMachine:474] Adding work command SAVE_STATE for sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57117-46e168f6-8753-4be0-873d-f18d1f562570
      I1116 17:11:54.482 [qtp1219612889-50, StateMachine$Builder:389] sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57118-ac94b4fb-f319-4ca2-b788-2ee093ef1c67 state machine transition INIT -> PENDING
      I1116 17:11:54.482 [qtp1219612889-50, TaskStateMachine:474] Adding work command SAVE_STATE for sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57118-ac94b4fb-f319-4ca2-b788-2ee093ef1c67
      I1116 17:11:54.611 [qtp1219612889-50, StateMachine$Builder:389] sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57119-060ef7fc-7e17-4f8c-83dc-216550332153 state machine transition INIT -> PENDING
      I1116 17:11:54.612 [qtp1219612889-50, TaskStateMachine:474] Adding work command SAVE_STATE for sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57119-060ef7fc-7e17-4f8c-83dc-216550332153
      I1116 17:11:54.741 [qtp1219612889-50, StateMachine$Builder:389] sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57120-c163c750-3658-44b7-b1ea-43f5d503f7c9 state machine transition INIT -> PENDING
      I1116 17:11:54.742 [qtp1219612889-50, TaskStateMachine:474] Adding work command SAVE_STATE for sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57120-c163c750-3658-44b7-b1ea-43f5d503f7c9
      ...
      

      Observe that a single jetty thread is doing this.

      We should leverage BatchWorker to have concurrent mutations here.

      Attachments

        Activity

          People

            zmanji Zameer Manji
            zmanji Zameer Manji
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: