10-17-2014 10:45 AM
Is there any documentation on how to use Kafka to write to HDFS? I'm aware of Camus but not sure how to set it up in the CDH environment. It would also be great if you can provide how to consume from Kafka (JSON or other formats) and write HDFS in Parquet format.
10-19-2014 05:31 PM
One way of getting data from Kafka to HDFS/HBase is via Flume i.e. Kafka --> Flume --> HDFS/HBase
You can use Flume's Kafka-Source to read from Kafka (https://issues.apache.org/jira/browse/FLUME-2250), and then use Flume sinks to write to HDFS/HBase. Flume's Kafka-Source is available in CDH 5.2.
Flume can not write directly to Parquet. You can do the conversion to parquet using the Kite SDK or by following the instructions here