Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-232

TailSource will add newlines in the middle of log lines

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.9.1
    • 0.9.2
    • Sinks+Sources
    • None
    • ubuntu jaunty

    Description

      RandomAccessFile.readLine returns a line terminated by \n, \r\n, or eof. If the process writing to the log flushes in the middle of the line, the reader could reach eof with an incomplete line, so the tailsource should wait for an actual newline. But readLine strips the terminator, so you can't tell if you've reached eof.

      I can reproduce this pretty consistently with the setup bob posted here:
      https://groups.google.com/a/cloudera.org/group/flume-user/browse_thread/thread/a32080ebc8a42596?pli=1
      I get extra breaks several times per minute using 19ms sleep.

      Since RandomAccessFile is not an InputStream, I don't think we can just wrap it in a bufferedlinereader or something similar. I could be wrong through; my java-fu is weak. I'm attaching a patch that adds a custom readLine function that only returns lines terminated by a real newline.

      I wouldn't actually recommend applying this patch as-is; it's sort of just a proof of concept. I'm working on a more efficient implementation.

      Attachments

        Issue Links

          Activity

            People

              jmhsieh Jonathan Hsieh
              flume_kevin Disabled imported user
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: