Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-20299

Creating a cache with an unknown data region name causes total unrecoverable failure of the grid

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.15
    • None
    • cache
    • None
    • Observed in:

      C# client and grid running on Linux in a container

      C# client and grid running on Windows

       

    • Docs Required, Release Notes Required

    Description

      Using the Ignite C# client.
       
      Given a running grid, having a client (and perhaps server) node in the grid attempt to create a cache using a DataRegionName that does not exist in the grid causes immediate failure in the client node with the following log output. 
       
      2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer]   Completed partition exchange [localNode=15122bd7-bf81-44e6-a548-e70dbd9334c0, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode [id=9d5ed68d-38bb-447d-aed5-189f52660716, consistentId=9d5ed68d-38bb-447d-aed5-189f52660716, addrs=ArrayList [127.0.0.1], sockAddrs=null, discPort=0, order=8, intOrder=8, lastExchangeTime=1693112858024, loc=false, ver=2.15.0#20230425-sha1:f98f7f35, isClient=true], rebalanced=false, done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]]
      2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer]   Exchange timings [startVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], stage="Waiting in exchange queue" (14850 ms), stage="Exchange parameters initialization" (2 ms), stage="Determine exchange type" (3 ms), stage="Exchange done" (4 ms), stage="Total time" (14859 ms)]
      2023-08-27 17:08:48,522 [44] INF [ImmutableClientServer]   Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]]
      2023-08-27 17:08:48,524 [44] INF [ImmutableClientServer]   Finished exchange init [topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], crd=false]
      2023-08-27 17:08:48,525 [44] INF [ImmutableClientServer]   AffinityTopologyVersion [topVer=15, minorTopVer=0], evt=NODE_FAILED, evtNode=9d5ed68d-38bb-447d-aed5-189f52660716, client=true]
      Unhandled exception: Apache.Ignite.Core.Cache.CacheException: class org.apache.ignite.IgniteCheckedException: Failed to complete exchange process.
       ---> Apache.Ignite.Core.Common.IgniteException: Failed to complete exchange process.
       ---> Apache.Ignite.Core.Common.JavaException: javax.cache.CacheException: class org.apache.ignite.IgniteCheckedException: Failed to complete exchange process.
              at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1272)
              at org.apache.ignite.internal.IgniteKernal.getOrCreateCache0(IgniteKernal.java:2278)
              at org.apache.ignite.internal.IgniteKernal.getOrCreateCache(IgniteKernal.java:2242)
              at org.apache.ignite.internal.processors.platform.PlatformProcessorImpl.processInStreamOutObject(PlatformProcessorImpl.java:643)
              at org.apache.ignite.internal.processors.platform.PlatformTargetProxyImpl.inStreamOutObject(PlatformTargetProxyImpl.java:79)
      Caused by: class org.apache.ignite.IgniteCheckedException: Failed to complete exchange process.
              at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.createExchangeException(GridDhtPartitionsExchangeFuture.java:3709)
              at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.sendExchangeFailureMessage(GridDhtPartitionsExchangeFuture.java:3737)
              at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.finishExchangeOnCoordinator(GridDhtPartitionsExchangeFuture.java:3832)
              at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onAllReceived(GridDhtPartitionsExchangeFuture.java:3813)
              at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1796)
              at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:1053)
              at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3348)
              at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3182)
              at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
              at java.base/java.lang.Thread.run(Thread.java:829)
              Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to initialize exchange locally [locNodeId=e9325b04-00fa-452e-9796-989b47b860ea]
                      at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:1483)
                      at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:979)
                      ... 4 more
              Caused by: class org.apache.ignite.IgniteCheckedException: Requested DataRegion is not configured: Default-Mutable
                      at org.apache.ignite.internal.processors.cache.persistence.IgniteCacheDatabaseSharedManager.dataRegion(IgniteCacheDatabaseSharedManager.java:896)
                      at org.apache.ignite.internal.processors.cache.GridCacheProcessor.startCacheGroup(GridCacheProcessor.java:2463)
                      at org.apache.ignite.internal.processors.cache.GridCacheProcessor.getOrCreateCacheGroupContext(GridCacheProcessor.java:2181)
                      at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheContext(GridCacheProcessor.java:1991)
                      at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheStart(GridCacheProcessor.java:1926)
                      at org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$prepareStartCaches$55a0e703$1(GridCacheProcessor.java:1801)
                      at org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$prepareStartCachesIfPossible$16(GridCacheProcessor.java:1771)
                      at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareStartCaches(GridCacheProcessor.java:1798)
                      at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareStartCachesIfPossible(GridCacheProcessor.java:1769)
                      at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processCacheStartRequests(CacheAffinitySharedManager.java:1000)
                      at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedManager.java:886)
                      at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:1472)
                      ... 5 more

         at Apache.Ignite.Core.Impl.Unmanaged.Jni.Env.ExceptionCheck()
         at Apache.Ignite.Core.Impl.Unmanaged.Jni.Env.CallObjectMethod(GlobalRef obj, IntPtr methodId, Int64* argsPtr)
         at Apache.Ignite.Core.Impl.Unmanaged.UnmanagedUtils.TargetInStreamOutObject(GlobalRef target, Int32 opType, Int64 inMemPtr)
         at Apache.Ignite.Core.Impl.PlatformJniTarget.InStreamOutObject(Int32 type, Action`1 writeAction)
         --- End of inner exception stack trace —
         --- End of inner exception stack trace —
         at Apache.Ignite.Core.Impl.PlatformJniTarget.InStreamOutObject(Int32 type, Action`1 writeAction)
         at Apache.Ignite.Core.Impl.PlatformTargetAdapter.DoOutOpObject(Int32 type, Action`1 action)
         at Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration configuration, NearCacheConfiguration nearConfiguration, PlatformCacheConfiguration platformCacheConfiguration, Op op)
         at Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration configuration, NearCacheConfiguration nearConfiguration, PlatformCacheConfiguration platformCacheConfiguration)
         at Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration configuration, NearCacheConfiguration nearConfiguration)
         at Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration configuration)
       
       
      This failure causes issues in the server nodes in the grid which now fail to restart with errors such as below (for the incorrectly create cache) but which are repeated for every defined cache in the grid:
       
      2023-08-27 17:11:36,882 [42] INF [ImmutableCacheComputeServer]   Can not finish proxy initialization because proxy does not exist, cacheName=SiteModelMetadata, localNodeId=3d4a75e8-174d-4947-877e-e45784d8d08d
      2
       
      At this point the grid is now unusable.
       
      In summary: Attempted creation of a cache with an unknown DataRegionName causes immediate and unrecoverable failure in the entire grid. 
       
       On attempted restarted Ignite notes all caches (including system caches) as being "Started cache in recovery mode" in the log and then scans the WAL. The incorrectly created cache is not mentioned at this point.

      At the point the cluster is activated, (ie: this appears in the log "Started state transition: activate cluster"), it states the incorrectly created cache ("SiteModelMetadata") can not be started, log entry is: "Cache can not be started : cache=SiteModelMetadata"

      This is followed by multiple messages like this: "Finished recovery for cache [cache=ignite-sys-cache, grp=ignite-sys-cache, startVer=AffinityTopologyVersion [topVer=12, minorTopVer=1]] "

       

      This is followed by errors relating to proxy initialisation, eg: "Can not finish proxy initialization because proxy does not exist, cacheName=ignite-sys-cache, localNodeId=4d44108f-cd96-4953-94db-6365f998a91b"

       

      All caches are then stopped, eg: "Stopped cache [cacheName=ignite-sys-cache]", and the grid enters a relatively dormant inactivated state  where it only emits Ignite heartbeat messages.

       

      This bug appears to be trivially easy to reproduce by creating a cache with an unknown data region.

       

      Attempting to destroy the bad cache in the grid with the Control.sh tool results in this output:

       

      # ./control.sh --cache destroy --caches SiteModelMetadata
      WARNING: An illegal reflective access operation has occurred
      WARNING: Illegal reflective access by org.apache.ignite.internal.util.GridUnsafe$2 (file:/trex/libs/ignite-core-2.15.0.jar) to field java.nio.Buffer.address
      WARNING: Please consider reporting this to the maintainers of org.apache.ignite.internal.util.GridUnsafe$2
      WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
      WARNING: All illegal access operations will be denied in a future releaseWarning! 
      The command will destroy 1 caches: SiteModelMetadata.
      If you continue, the cache data will be impossible to recover.
      Press 'y' to continue . . . y
      [01:53:38,925][SEVERE][session=24175683][CommandHandlerLog] Connection to cluster failed. Latest topology update failed.Control utility [ver. 2.15.0#20230425-sha1:f98f7f35]
      2023 Copyright(C) Apache Software FoundationUser: rootTime: 2023-08-29T01:53:26.048
      Command [CACHE] startedArguments: --cache destroy --caches SiteModelMetadata--------------------------------------------------------------------------------
      Connection to cluster failed. 
      Latest topology update failed.Command [CACHE] finished with code: 
      Control utility has completed execution at: 2023-08-29T01:53:38.926
      

       

       The Control.sh command to list caches also fails.

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            rpwilson Raymond Wilson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: