Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Flafka

Flafka

New Contributor

Hi, We are using cloudera 5.4 and  flafka to stream in the data from source database to Hadoop.

 

basically the data is ingested to Kafka in plain String format and the source DB table name is used as the key for each message.

 

we are using flume hdfs sink to store the message to Hadoop.

 

our configuration is similar as what's documented here (Kafka channel)

http://www.cloudera.com/content/www/en-us/documentation/kafka/latest/topics/kafka_flume.html except we want to store

the messages from the same DB table to its own file. since each message in kafka is keyed by the db table, i'm hoping i can do something

like the following

 

tier1.sinks.sink1.hdfs.path = /tmp/kafka/%{topic}/%{messageKey}.csv

Can anybody please let me know if there is a way to store kafka messages to HDFS using the message key as file name?

 

thanks

1 REPLY 1
Highlighted

Re: Flafka

Cloudera Employee

HI Leon,

 

 

When the Kafka source reads from kafka, it will look for the message key and set it in the "key" header.

 

So you should be able to do this

tier1.sinks.sink1.hdfs.path = /tmp/kafka/%{topic}/%{key}.csv

 

 

But I haven't tested this.

 

If you are using the Kafka Channel and going directly to HDFS, then you're out of luck, as we don't really use the headers / message keys for text or really any non Flume Avro Event messages.

 

Thanks

 

Jeff