Reply
Highlighted
Explorer
Posts: 16
Registered: ‎09-25-2017

Flume HDFS sink and number of Kafka partitions ?

[ Edited ]

Is it expected that flume HDFS sink creates an HDFS file for each Kafka topic partition from which it consumes events (either via Kafka source or Kafka channel)? Any convenient way to consolidate to a single HDFS file? 

Cloudera Employee
Posts: 273
Registered: ‎01-09-2014

Re: Flume HDFS sink and number of Kafka partitions ?

No, the hdfs.path and any variables used will determine how many files get created in hdfs.Depending on whether you use headers or not (you could use a %{topic} header) in the hdfs.path or filePrefix will determine how many files get written. The sink will consume events from the channel, and won't differentiate on different topics. The kafka channel can only have one topic, and the sink can only have one channel, so effectively one topic. If you use the flume kafka source with multiple topics, then all those events will end up in the channel that the sink pulls from.

-pd
Announcements
New solutions