Details
Description
I ran a 24 hours CI test against 1.6.0 RC5 w/ agitation.
I modified the agitation settings to the following :
#time amount of time (in minutes) the agitator should sleep before killing KILL_SLEEP_TIME=3 #time amount of time (in minutes) the agitator should sleep after killing before running tup TUP_SLEEP_TIME=1 #the minimum and maximum server the agitator will kill at once MIN_KILL=1 MAX_KILL=2
I started 3 walkers all of which died. The walkers saw org.apache.accumulo.core.client.impl.AccumuloServerException. On the tserver the cause was org.apache.hadoop.hdfs.BlockMissingException.
After stopping agitation scripts, I ran start-dfs.sh and saw it started 5 datanodes. Looking at datanode-agitator.pl I think the problem is when it kills two datanodes, it only restarts one.
All of my ingest clients survived and were able to write 8 billion entries in this wacky environment. I noticed on the monitor that there were long periods of no ingest, but it was not a complete flat line.