Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Flume HDFS sink and number of Kafka partitions ?

Flume HDFS sink and number of Kafka partitions ?

Explorer

Is it expected that flume HDFS sink creates an HDFS file for each Kafka topic partition from which it consumes events (either via Kafka source or Kafka channel)? Any convenient way to consolidate to a single HDFS file? 

1 REPLY 1
Highlighted

Re: Flume HDFS sink and number of Kafka partitions ?

Super Collaborator
No, the hdfs.path and any variables used will determine how many files get created in hdfs.Depending on whether you use headers or not (you could use a %{topic} header) in the hdfs.path or filePrefix will determine how many files get written. The sink will consume events from the channel, and won't differentiate on different topics. The kafka channel can only have one topic, and the sink can only have one channel, so effectively one topic. If you use the flume kafka source with multiple topics, then all those events will end up in the channel that the sink pulls from.

-pd
Don't have an account?
Coming from Hortonworks? Activate your account here