Details
Description
In HDFS-13175 it was discovered that when a DN reports the usage on a volume to be greater than the volume capacity, the disk balancer will fail with an unhelpful error:
$ hdfs diskbalancer -report -top 5 18/06/11 10:19:43 INFO command.Command: Processing report command 18/06/11 10:19:44 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 18/06/11 10:19:44 INFO block.BlockTokenSecretManager: Setting block keys 18/06/11 10:19:44 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec 18/06/11 10:19:44 ERROR tools.DiskBalancerCLI: java.lang.IllegalArgumentException
In HDFS-13175, a change was made to include more details in the exception name, so after the change the code is:
public void setUsed(long dfsUsedSpace) { Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(), "DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)", dfsUsedSpace, getCapacity()); this.used = dfsUsedSpace; }
There may however be other scenarios that cause the balancer to exit with an unhandled exception, and it would be helpful if the tool logged out the full stack trace on error rather than just the exception name.
In DiskBalancerCLI.java, the relevant code is:
public static void main(String[] argv) throws Exception { DiskBalancerCLI shell = new DiskBalancerCLI(new HdfsConfiguration()); int res = 0; try { res = ToolRunner.run(shell, argv); } catch (Exception ex) { LOG.error(ex.toString()); res = 1; } System.exit(res); }
We should change the error logged in the exception block to log out the full stack to give more information on all unhandled errors, eg:
LOG.error(ex.toString(), ex);