[ACCUMULO-2768] Agitator not restarting all datanodes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.5.1, 1.6.0
Fix Version/s: 1.5.2, 1.6.1
Component/s: test
Labels:
- perl
Environment:

1.6.0 RC5, hadoop 2.2.0, ZK 3.4.5
20 node EC2 cluster

Description

I ran a 24 hours CI test against 1.6.0 RC5 w/ agitation.

I modified the agitation settings to the following :

#time amount of time (in minutes) the agitator should sleep before killing
KILL_SLEEP_TIME=3

#time amount of time (in minutes) the agitator should sleep after killing before running tup 
TUP_SLEEP_TIME=1

#the minimum and maximum server the agitator will kill at once
MIN_KILL=1
MAX_KILL=2

I started 3 walkers all of which died. The walkers saw org.apache.accumulo.core.client.impl.AccumuloServerException. On the tserver the cause was org.apache.hadoop.hdfs.BlockMissingException.

After stopping agitation scripts, I ran start-dfs.sh and saw it started 5 datanodes. Looking at datanode-agitator.pl I think the problem is when it kills two datanodes, it only restarts one.

All of my ingest clients survived and were able to write 8 billion entries in this wacky environment. I noticed on the monitor that there were long periods of no ingest, but it was not a complete flat line.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ACCUMULO-2768.patch
13/May/14 19:17
1 kB
Drew Farris

Activity

People

Assignee:: Drew Farris

Reporter:: Keith Turner

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 01/May/14 16:58

Updated:: 14/May/14 14:58

Resolved:: 14/May/14 03:22