Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.9.1
-
None
-
ubuntu jaunty
Description
RandomAccessFile.readLine returns a line terminated by \n, \r\n, or eof. If the process writing to the log flushes in the middle of the line, the reader could reach eof with an incomplete line, so the tailsource should wait for an actual newline. But readLine strips the terminator, so you can't tell if you've reached eof.
I can reproduce this pretty consistently with the setup bob posted here:
https://groups.google.com/a/cloudera.org/group/flume-user/browse_thread/thread/a32080ebc8a42596?pli=1
I get extra breaks several times per minute using 19ms sleep.
Since RandomAccessFile is not an InputStream, I don't think we can just wrap it in a bufferedlinereader or something similar. I could be wrong through; my java-fu is weak. I'm attaching a patch that adds a custom readLine function that only returns lines terminated by a real newline.
I wouldn't actually recommend applying this patch as-is; it's sort of just a proof of concept. I'm working on a more efficient implementation.
Attachments
Issue Links
- relates to
-
FLUME-252 Update Tail to get rid of races and truncation problems.
- Closed