Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.1.0, master
-
None
-
None
Description
Using 3C3D deployment, perform query simulation load and write simulation load. Through the monitoring panel, it is discovered that only one of the three nodes is executing queries. After a period of time (4 days), the node executing the queries experiences frequent FGCs, with CPU usage consistently reaching 100%, and both write and query operations become unresponsive; the other two nodes have no abnormalities, but the write traffic begins to decline. After killing the query process, observe the monitoring panel of the FGC-affected node, and about 10 minutes later, FGCs no longer occur, CPU utilization drops below 40%, the write throughput starts to rise. And at this time, any query from the query load executed through the CLI will result in the error shown in the following figure. That's said, writing can recover from FGCs, while query cannot. After 24 hours, the system cpu load is about 40%, FGCs occur often, the write throughput is low, and query is still not executable.