Reply
Highlighted
New Contributor
Posts: 5
Registered: ‎01-13-2019

Flume - multiplexing by event date

[ Edited ]

I'm trying to implement flume to comsume a Kafka topic that gets around 40k messages a second. My current problem is the required directory structure for the HDFS sink creates too many files since the event timestamp can vary by days.

 

node1.sinks.sink1.type = hdfs
node1.sinks.sink1.channel = channel1
node1.sinks.sink1.hdfs.path = hdfs://datawarehouse/data/flume_%{topic}/batch=%{batch}/year=%Y/month=%m/day=%d/hour=%H

Is it possible to use multiplexing to send events older than 24 hours to a specific channel? That way I can send those to an HDFS sink without the "hour=%H" directory partition, which would lessen the number of files dramatically.

New Contributor
Posts: 5
Registered: ‎01-13-2019

Re: Flume - multiplexing by event date

We got this working by inserting a new header value if the event is old or not using our custom interceptor.
Announcements
New solutions