Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-19115

Possible deadlock in handling pending cache messages when the cache is recreated

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.14
    • 2.15
    • None
    • None

    Description

      Let's consider the following scenario:
      Precondition:
      there is a cluster of two server nodes (node A - coordinator, and node B) and an atomic cache that resides on that nodes.
      current topology version is (x, y)

      Node B initiates putting a new key-value pair into the atomic cache. Let's assume the primary partition, which belongs to the key, resides on node A.

      The previous step requires acquiring a gateway lock for the corresponding cache (GridCacheGateway read lock) and registering GridNearAtomicSingleUpdateFuture into the MVCC manager. It is important to note, that cache future does not acquire topology lock and so should not block PME

      Concurrently, node A initiates destroying the cache. Corresponding PME will be successfully completed on the coordinator node and blocked on node B just because the gateway is already acquired

      Thread [name="sys-#105%dht.IgniteCacheRecreateTest1%", id=123, state=TIMED_WAITING, blockCnt=0, waitCnt=350]
              at java.lang.Thread.sleep(Native Method)
              at o.a.i.i.util.IgniteUtils.sleep(IgniteUtils.java:8316)
              at o.a.i.i.processors.cache.GridCacheGateway.onStopped(GridCacheGateway.java:324)
              at o.a.i.i.processors.cache.GridCacheProcessor.stopGateway(GridCacheProcessor.java:2582)
              at o.a.i.i.processors.cache.GridCacheProcessor.lambda$processCacheStopRequestOnExchangeDone$1c59e5cf$1(GridCacheProcessor.java:2776)
              at o.a.i.i.processors.cache.GridCacheProcessor$$Lambda$714/770930142.apply(Unknown Source)
              at o.a.i.i.util.IgniteUtils.doInParallel(IgniteUtils.java:11628)
              at o.a.i.i.util.IgniteUtils.doInParallel(IgniteUtils.java:11530)
              at o.a.i.i.processors.cache.GridCacheProcessor.processCacheStopRequestOnExchangeDone(GridCacheProcessor.java:2755)
              at o.a.i.i.processors.cache.GridCacheProcessor.onExchangeDone(GridCacheProcessor.java:2945)
              at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:2528)
              at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processFullMessage(GridDhtPartitionsExchangeFuture.java:4785)
              at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$1500(GridDhtPartitionsExchangeFuture.java:161)
              at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4453)
              at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4441)
              at o.a.i.i.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:464)
              at o.a.i.i.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:355)
              at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveFullMessage(GridDhtPartitionsExchangeFuture.java:4441)
              at o.a.i.i.processors.cache.GridCachePartitionExchangeManager.processFullPartitionUpdate(GridCachePartitionExchangeManager.java:1991)
              at o.a.i.i.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:469)
              at o.a.i.i.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:454)
              at o.a.i.i.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3765)
              at o.a.i.i.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3744)
              at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1151)
              at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:592)
              at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:393)
              at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:319)
              at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:110)
              at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:309)
              at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907)
              at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528)
              at o.a.i.i.managers.communication.GridIoManager.access$5300(GridIoManager.java:243)
              at o.a.i.i.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421)
              at o.a.i.i.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:750)
      

      Node A initiates creating a new cache with the same name as previously destroyed.

      Node A received a cache update message but it cannot be processed, because a new cache (cache with the same cacheId) is starting, so, the processing of this message should be postponed until PME is completed (In this case the GridDhtForceKeysFuture is created, and the message will not be processed until PME is completed. So, the near node will not receive a response and it will not be able to complete the previous exchange future. see IGNITE-10251).

      new PME on node B cannot proceed further just because of 3.

      Attachments

        Activity

          People

            slava.koptilin Vyacheslav Koptilin
            slava.koptilin Vyacheslav Koptilin
            Ivan Daschinsky Ivan Daschinsky
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 40m
                40m