Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.15
-
None
-
None
-
Observed in:
C# client and grid running on Linux in a container
C# client and grid running on Windows
-
Docs Required, Release Notes Required
Description
Using the Ignite C# client.
Given a running grid, having a client (and perhaps server) node in the grid attempt to create a cache using a DataRegionName that does not exist in the grid causes immediate failure in the client node with the following log output.
2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer] Completed partition exchange [localNode=15122bd7-bf81-44e6-a548-e70dbd9334c0, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode [id=9d5ed68d-38bb-447d-aed5-189f52660716, consistentId=9d5ed68d-38bb-447d-aed5-189f52660716, addrs=ArrayList [127.0.0.1], sockAddrs=null, discPort=0, order=8, intOrder=8, lastExchangeTime=1693112858024, loc=false, ver=2.15.0#20230425-sha1:f98f7f35, isClient=true], rebalanced=false, done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]]
2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer] Exchange timings [startVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], stage="Waiting in exchange queue" (14850 ms), stage="Exchange parameters initialization" (2 ms), stage="Determine exchange type" (3 ms), stage="Exchange done" (4 ms), stage="Total time" (14859 ms)]
2023-08-27 17:08:48,522 [44] INF [ImmutableClientServer] Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]]
2023-08-27 17:08:48,524 [44] INF [ImmutableClientServer] Finished exchange init [topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], crd=false]
2023-08-27 17:08:48,525 [44] INF [ImmutableClientServer] AffinityTopologyVersion [topVer=15, minorTopVer=0], evt=NODE_FAILED, evtNode=9d5ed68d-38bb-447d-aed5-189f52660716, client=true]
Unhandled exception: Apache.Ignite.Core.Cache.CacheException: class org.apache.ignite.IgniteCheckedException: Failed to complete exchange process.
---> Apache.Ignite.Core.Common.IgniteException: Failed to complete exchange process.
---> Apache.Ignite.Core.Common.JavaException: javax.cache.CacheException: class org.apache.ignite.IgniteCheckedException: Failed to complete exchange process.
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1272)
at org.apache.ignite.internal.IgniteKernal.getOrCreateCache0(IgniteKernal.java:2278)
at org.apache.ignite.internal.IgniteKernal.getOrCreateCache(IgniteKernal.java:2242)
at org.apache.ignite.internal.processors.platform.PlatformProcessorImpl.processInStreamOutObject(PlatformProcessorImpl.java:643)
at org.apache.ignite.internal.processors.platform.PlatformTargetProxyImpl.inStreamOutObject(PlatformTargetProxyImpl.java:79)
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to complete exchange process.
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.createExchangeException(GridDhtPartitionsExchangeFuture.java:3709)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.sendExchangeFailureMessage(GridDhtPartitionsExchangeFuture.java:3737)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.finishExchangeOnCoordinator(GridDhtPartitionsExchangeFuture.java:3832)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onAllReceived(GridDhtPartitionsExchangeFuture.java:3813)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1796)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:1053)
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3348)
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3182)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
at java.base/java.lang.Thread.run(Thread.java:829)
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to initialize exchange locally [locNodeId=e9325b04-00fa-452e-9796-989b47b860ea]
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:1483)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:979)
... 4 more
Caused by: class org.apache.ignite.IgniteCheckedException: Requested DataRegion is not configured: Default-Mutable
at org.apache.ignite.internal.processors.cache.persistence.IgniteCacheDatabaseSharedManager.dataRegion(IgniteCacheDatabaseSharedManager.java:896)
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.startCacheGroup(GridCacheProcessor.java:2463)
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.getOrCreateCacheGroupContext(GridCacheProcessor.java:2181)
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheContext(GridCacheProcessor.java:1991)
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheStart(GridCacheProcessor.java:1926)
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$prepareStartCaches$55a0e703$1(GridCacheProcessor.java:1801)
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$prepareStartCachesIfPossible$16(GridCacheProcessor.java:1771)
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareStartCaches(GridCacheProcessor.java:1798)
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareStartCachesIfPossible(GridCacheProcessor.java:1769)
at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processCacheStartRequests(CacheAffinitySharedManager.java:1000)
at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedManager.java:886)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:1472)
... 5 more
at Apache.Ignite.Core.Impl.Unmanaged.Jni.Env.ExceptionCheck()
at Apache.Ignite.Core.Impl.Unmanaged.Jni.Env.CallObjectMethod(GlobalRef obj, IntPtr methodId, Int64* argsPtr)
at Apache.Ignite.Core.Impl.Unmanaged.UnmanagedUtils.TargetInStreamOutObject(GlobalRef target, Int32 opType, Int64 inMemPtr)
at Apache.Ignite.Core.Impl.PlatformJniTarget.InStreamOutObject(Int32 type, Action`1 writeAction)
--- End of inner exception stack trace —
--- End of inner exception stack trace —
at Apache.Ignite.Core.Impl.PlatformJniTarget.InStreamOutObject(Int32 type, Action`1 writeAction)
at Apache.Ignite.Core.Impl.PlatformTargetAdapter.DoOutOpObject(Int32 type, Action`1 action)
at Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration configuration, NearCacheConfiguration nearConfiguration, PlatformCacheConfiguration platformCacheConfiguration, Op op)
at Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration configuration, NearCacheConfiguration nearConfiguration, PlatformCacheConfiguration platformCacheConfiguration)
at Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration configuration, NearCacheConfiguration nearConfiguration)
at Apache.Ignite.Core.Impl.Ignite.GetOrCreateCache[TK,TV](CacheConfiguration configuration)
This failure causes issues in the server nodes in the grid which now fail to restart with errors such as below (for the incorrectly create cache) but which are repeated for every defined cache in the grid:
2023-08-27 17:11:36,882 [42] INF [ImmutableCacheComputeServer] Can not finish proxy initialization because proxy does not exist, cacheName=SiteModelMetadata, localNodeId=3d4a75e8-174d-4947-877e-e45784d8d08d
2
At this point the grid is now unusable.
In summary: Attempted creation of a cache with an unknown DataRegionName causes immediate and unrecoverable failure in the entire grid.
On attempted restarted Ignite notes all caches (including system caches) as being "Started cache in recovery mode" in the log and then scans the WAL. The incorrectly created cache is not mentioned at this point.
At the point the cluster is activated, (ie: this appears in the log "Started state transition: activate cluster"), it states the incorrectly created cache ("SiteModelMetadata") can not be started, log entry is: "Cache can not be started : cache=SiteModelMetadata"
This is followed by multiple messages like this: "Finished recovery for cache [cache=ignite-sys-cache, grp=ignite-sys-cache, startVer=AffinityTopologyVersion [topVer=12, minorTopVer=1]] "
This is followed by errors relating to proxy initialisation, eg: "Can not finish proxy initialization because proxy does not exist, cacheName=ignite-sys-cache, localNodeId=4d44108f-cd96-4953-94db-6365f998a91b"
All caches are then stopped, eg: "Stopped cache [cacheName=ignite-sys-cache]", and the grid enters a relatively dormant inactivated state where it only emits Ignite heartbeat messages.
This bug appears to be trivially easy to reproduce by creating a cache with an unknown data region.
Attempting to destroy the bad cache in the grid with the Control.sh tool results in this output:
# ./control.sh --cache destroy --caches SiteModelMetadata WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.ignite.internal.util.GridUnsafe$2 (file:/trex/libs/ignite-core-2.15.0.jar) to field java.nio.Buffer.address WARNING: Please consider reporting this to the maintainers of org.apache.ignite.internal.util.GridUnsafe$2 WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future releaseWarning! The command will destroy 1 caches: SiteModelMetadata. If you continue, the cache data will be impossible to recover. Press 'y' to continue . . . y [01:53:38,925][SEVERE][session=24175683][CommandHandlerLog] Connection to cluster failed. Latest topology update failed.Control utility [ver. 2.15.0#20230425-sha1:f98f7f35] 2023 Copyright(C) Apache Software FoundationUser: rootTime: 2023-08-29T01:53:26.048 Command [CACHE] startedArguments: --cache destroy --caches SiteModelMetadata-------------------------------------------------------------------------------- Connection to cluster failed. Latest topology update failed.Command [CACHE] finished with code: Control utility has completed execution at: 2023-08-29T01:53:38.926
The Control.sh command to list caches also fails.