Cloudera Labs
Provide feedback on Cloudera Labs
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Kafka->HDFS pipeline

Kafka->HDFS pipeline

Expert Contributor

Is there any documentation on how to use Kafka to write to HDFS? I'm aware of Camus but not sure how to set it up in the CDH environment. It would also be great if you can provide how to consume from Kafka (JSON or other formats) and write HDFS in Parquet format.

 

Thanks!

3 REPLIES 3

Re: Kafka->HDFS pipeline

Cloudera Employee

Hi Buntu,

 

One way of getting data from Kafka to HDFS/HBase is via Flume i.e. Kafka --> Flume --> HDFS/HBase

 

You can use Flume's Kafka-Source to read from Kafka (https://issues.apache.org/jira/browse/FLUME-2250), and then use Flume sinks to write to HDFS/HBase. Flume's Kafka-Source is available in CDH 5.2.

 

Flume can not write directly to Parquet. You can do the conversion to parquet using the Kite SDK or by following the instructions here

 

 

Highlighted

Re: Kafka->HDFS pipeline

Cloudera Employee

You can find step by step instructions how to configure Flume to read from Kafka in Gwen's Flume or Kafka? Try both! blog.

Re: Kafka->HDFS pipeline

Expert Contributor

Thanks for the info.

 

I couldn't find how to convert to Parquet format using Kite SDK, any pointers would be helpful.

Don't have an account?
Coming from Hortonworks? Activate your account here