Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Flume - how to create a custom key for a HDFS SequenceFile?

Flume - how to create a custom key for a HDFS SequenceFile?

Explorer

Hello,

I'm using Flume's HDFS SequenceFile sink for writing data to HDFS.
I'm looking for a possibility to create "custom keys". Per default, Flume is using the Timestamp as key within a SequenceFile. However, in my usecase I would like to use a customized string as key (instead of the timestamp).

What are best practices for implementing/configuring such a "custom key" within Flume?

Best,
Thomas

1 REPLY 1

Re: Flume - how to create a custom key for a HDFS SequenceFile?

Master Guru
Per http://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#hdfs-sink, see this option:

"""
Config: "serializer" (Default: TEXT) (Desc: Other possible options include avro_event or the fully-qualified class name of an implementation of the EventSerializer.Builder interface.)
"""

The HDFS sink allows you to set a custom class that implements the serialiser, which allows you to plug in the custom classes for the SequenceFile.

This is the interface to implement for your custom class: https://github.com/cloudera/flume-ng/blob/cdh5.4.5-release/flume-ng-sinks/flume-hdfs-sink/src/main/j... (or extend one of the existing serialisers from here: https://github.com/cloudera/flume-ng/tree/cdh5.4.5-release/flume-ng-sinks/flume-hdfs-sink/src/main/j...
This is the example control point for the default Writable serialiser, and the full impl. of the same thing: https://github.com/cloudera/flume-ng/blob/cdh5.4.5-release/flume-ng-sinks/flume-hdfs-sink/src/main/j...