[CASSANDRA-5204] The first three Cassandra node is very busy , GC pause the world (Real production Env. Exp.) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Low
Resolution: Invalid
Fix Version/s: 1.0.10
Component/s: None
Labels:
None
Environment:

cassandra 1.1.5 release
centos 5.5
jdk1.7u9
vmware(TM)'s exsi based VM : 30GB RAM , 4*4core CPU
Hard ware : Dell R720 , 2*6core CPU , 128GB RAM , made 3 node as above
data hosted by each node : about 8GB

Description

hi dear cares ,

I have 10 nodes before , all on the centos VM with 16GB ram and 8core CPU , and running the cassandra 1.1.5 with only one User keyspace (RF=3) . Heap(Old:8GB,New:2GB)

matters :
1. the first three nodes (from token 0) goes very busy all the time , but the left 7 nodes seems nothing to do , both the CPU and RAM was freely .

2. all of the first three nodes' JVM ram cost increasing crazy , CMS GC fires nearly every seconds

3. when GC happened , the world seems stopped . checking via node tool , when running node tool on the first three node , nodetool will hung up . when running on the left 7 nodes , it shows that the first three node down

4. when GC finished , the node comes back , but it will gone in mins later .

5. kill java process , reboot the frozen node , it will up in mins , and the JVM ram will be increasing full in mins as well , and everythings above repeating ....

6. even if only one of the first three node frozen , the client request will failed . but my client request CL=QUORUM , and I am playing with hector client lib.

7. disable the three nodes' thrift api , nothing changed.

---------~~change~~-----------
0. stop the coming user request (stop our user service to make cassandra free)
1. decommission 4 nodes (one by one)
2. moving tokens to banlance the left 6 nodes (one by one)
3. change the left 6 node resource to : 30GB RAM 16core CPU , heap(16G old , 4GB new)
4. enable JNA
5. do major compaction on the 6nodes , do repair on the 6nodes
6. start the new cluster ...
7. everything seems ok in the early running time , but 5hours past , every bad matters come back .
8. because of we have got double RAM now , the dead repeating cycle goes hourly

some screen short attached .

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: sunjian

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 31/Jan/13 04:04

Updated:: 16/Apr/19 09:32

Resolved:: 31/Jan/13 04:09