[FLUME-506] Sink that writes output to a sequenceFile in hdfs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Trivial
Resolution: Duplicate
Affects Version/s: 0.9.1u1
Fix Version/s: None
Component/s: Sinks+Sources
Labels:
- sink

Description

I have written a code to be able to write to hdfs in this style:
roll(1000) [ escapedCustomDfs("hdfs://namenode/flume/file-%

{rolltag}") ]
But writing each event body as value in a SequenceFile (and NullWritable as key).
As a possible scenario, there would be a client sending flume events to a rpcSource. The body of each event is a thrift object. I have a sink that opens/closes a SequenceFile every X minutes, writes the body of each event (a thrift object converted to bytes) as BytesWritable as the Value of the SequenceFile and writes the key as NullWritable.
The purpose of this is that these SequenceFiles can be used as input in a MapReduce Job. Each BytesWritable value can be easily deserialized into a Thrift object (the initially generated by the client and set to the event body) in the map phase.
The plugins are (placed in the plugins folder in the zip):
com.cloudera.flume.handlers.hdfs.EscapedThriftSeqfileDfsSink
com.cloudera.flume.handlers.hdfs.ThriftSeqfileDfsSink

An example of node configuration:
node.name : tSource(38575) | roll(60000) { escapedThriftSeqfileDfsSink("hdfs://host.local/user/flume/file-%{rolltag}

")}

The zip contains also a couple of test:

1.-Php client that generates thrift objects and sends them via rpc to the source (The example objects are instances of TObject class, generated with thrift). This can be found in /test/php_rpc_writer.

2.-Java app that reads the sequenceFile generated by flume. The file has to be manually downloaded from hdfs and we must specify the path where we place it in the app. This app has the same TObject class than the php so it can deserialize the BytesWritable from the SequenceFile to the proper thrift object.
*The examples have hardcoded paths which should be properly set.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ASF.LICENSE.NOT.GRANTED--collector_sink_plugins.zip
12/Mar/11 15:27
49 kB
Disabled imported user
ASF.LICENSE.NOT.GRANTED--collector_sink_plugins.zip
08/Mar/11 09:50
47 kB
Disabled imported user
ASF.LICENSE.NOT.GRANTED--FLUME-506.patch
22/Apr/11 20:24
23 kB
Disabled imported user
ASF.LICENSE.NOT.GRANTED--sink_plugins.zip
03/Feb/11 16:15
32 kB
Disabled imported user

Issue Links

duplicates

FLUME-635 Allow the compression codec to be specified in SequenceFiles

Closed

Activity

People

Assignee:: Jonathan Hsieh

Reporter:: Disabled imported user

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 03/Feb/11 16:15

Updated:: 19/Aug/11 03:04

Resolved:: 19/Aug/11 03:04