Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
dynamic ingest on a large cluster, hadoop cdh3beta2
Description
Monitor was showing lots of errors for missing files. Analysis of any single file shows that the tablet was not moved, nor assigned to multiple servers. All the errors are for files that were minor compacted with many namenode operations failing/retried. The files were not deleted by the accumulo garbage collector. Checking the name node logs, there is no mention of the file being created, but there is a mention of the final rename of the file failing. Possible HDFS issue: file open and write succeeds, close succeeds, the file is then re-opened, and checked; yet the file is not created.
The return code of the rename to bring the file online was not checked.