Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-793

fix deadlock caused by listing queues with scheduling pending pods

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 0.12.1
    • None

    Description

      `GetPartitionQueues` calls read lock multiple times. If there is a thread which is waiting write lock, it can exclude all new read locks. In short, the following execution order can cause dead lock.

      1. hold read lock ---> thread 0
      2. wait write lock ---> thread 1 is locked by thread 0
      3. acquire read lock ---> thread 0 is locked by thread 1

      see docs for more details (https://pkg.go.dev/sync#RWMutex)

      The pprof is shown below.

      1 @ 0x43ada5 0x44ca85 0x44ca6e 0x46de27 0x47ce25 0x47e590 0x47e522 0x9c5753 0x9ad231 0x9fbcfa 0x9e4986 0x9e4851 0x9de655 0x9ff792 0x471c61
      #	0x46de26	sync.runtime_SemacquireMutex+0x46									/Users/chia7712/Library/go/default/src/runtime/sema.go:71
      #	0x47ce24	sync.(*Mutex).lockSlow+0x104										/Users/chia7712/Library/go/default/src/sync/mutex.go:138
      #	0x47e58f	sync.(*Mutex).Lock+0x8f											/Users/chia7712/Library/go/default/src/sync/mutex.go:81
      #	0x47e521	sync.(*RWMutex).Lock+0x21										/Users/chia7712/Library/go/default/src/sync/rwmutex.go:111
      #	0x9c5752	github.com/apache/incubator-yunikorn-core/pkg/scheduler/objects.(*Queue).incPendingResource+0x52	/Users/chia7712/go/pkg/mod/github.com/chia7712/incubator-yunikorn-core@v0.0.0-20210811001640-eaa6afb10b62/pkg/scheduler/objects/queue.go:454
      
      
      1 @ 0x43ada5 0x44ca85 0x44ca6e 0x46de27 0x9c7eae 0x9c7e34 0x9c51f8 0x9c54ab 0xa59e45 0xa59e94 0x711024 0xa5b310 0x711024 0xa011b3 0x7145e3 0x70fb0d 0x471c61
      #	0x46de26	sync.runtime_SemacquireMutex+0x46									/Users/chia7712/Library/go/default/src/runtime/sema.go:71
      #	0x9c7ead	sync.(*RWMutex).RLock+0xad										/Users/chia7712/Library/go/default/src/sync/rwmutex.go:63
      #	0x9c7e33	github.com/apache/incubator-yunikorn-core/pkg/scheduler/objects.(*Queue).IsLeafQueue+0x33		/Users/chia7712/go/pkg/mod/github.com/chia7712/incubator-yunikorn-core@v0.0.0-20210811001640-eaa6afb10b62/pkg/scheduler/objects/queue.go:667
      #	0x9c51f7	github.com/apache/incubator-yunikorn-core/pkg/scheduler/objects.(*Queue).GetPartitionQueues+0x1f7	/Users/chia7712/go/pkg/mod/github.com/chia7712/incubator-yunikorn-core@v0.0.0-20210811001640-eaa6afb10b62/pkg/scheduler/objects/queue.go:426
      #	0x9c54aa	github.com/apache/incubator-yunikorn-core/pkg/scheduler/objects.(*Queue).GetPartitionQueues+0x4aa	/Users/chia7712/go/pkg/mod/github.com/chia7712/incubator-yunikorn-core@v0.0.0-20210811001640-eaa6afb10b62/pkg/scheduler/objects/queue.go:416
      

      Attachments

        Issue Links

          Activity

            People

              chia7712 Chia-Ping Tsai
              chia7712 Chia-Ping Tsai
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: