Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-14138

Historical rebalance kills cluster

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.10
    • None
    • None
    • Fixed stopping node in some cases when historical rebalance could not find reserved WAL segments. Now the node refuses from supplying partitions historically.
    • Docs Required, Release Notes Required

    Description

      [2021-01-12T05:11:02,142][ERROR][rebalance-#508%---%][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteCheckedException: Failed to continue supplying [grp=SQL_USAGES_EPE, demander=48254935-7aa9-4ab5-b398-fdaec334fab7, topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1]]]]
      org.apache.ignite.IgniteCheckedException: Failed to continue supplying [grp=SQL_1, demander=48254935-7aa9-4ab5-b398-fdaec334fab7, topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1]]
      	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier.handleDemandMessage(GridDhtPartitionSupplier.java:571) [ignite-core.jar]
      	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleDemandMessage(GridDhtPreloader.java:398) [ignite-core.jar]
      	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:489) [ignite-core.jar]
      	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:474) [ignite-core.jar]
      	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142) [ignite-core.jar]
      	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591) [ignite-core.jar]
      	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$800(GridCacheIoManager.java:109) [ignite-core.jar]
      	at org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1707) [ignite-core.jar]
      	at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1721) [ignite-core.jar]
      	at org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:157) [ignite-core.jar]
      	at org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:3011) [ignite-core.jar]
      	at org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1662) [ignite-core.jar]
      	at org.apache.ignite.internal.managers.communication.GridIoManager.access$4900(GridIoManager.java:157) [ignite-core.jar]
      	at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1629) [ignite-core.jar]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
      	at java.lang.Thread.run(Thread.java:834) [?:?]
      Caused by: org.apache.ignite.IgniteCheckedException: Could not find start pointer for partition [part=4, partCntrSince=1115]
      	at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointHistory.searchEarliestWalPointer(CheckpointHistory.java:557) ~[ignite-core.jar]
      	at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.historicalIterator(GridCacheOffheapManager.java:1121) ~[ignite-core.jar]
      	at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.rebalanceIterator(IgniteCacheOffheapManagerImpl.java:1195) ~[ignite-core.jar]
      	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier.handleDemandMessage(GridDhtPartitionSupplier.java:322) ~[ignite-core.jar]
      	... 16 more
      

      I believe that it should throw IgniteHistoricalIteratorException instead of IgniteCheckedException, so it can be properly handled and rebalance can move to the full rebalance instead of killing nodes

      Attachments

        Issue Links

          Activity

            People

              v.pyatkov Vladislav Pyatkov
              v.pyatkov Vladislav Pyatkov
              Slava Koptilin Slava Koptilin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m