Welcome to the Cloudera Community

Explorer

Hi,

Looking for some advice on the best way to store streaming data from Kafka into HDFS, currently using Spark Streaming at 30m intervals creates lots of small files. I have attempted to use Hive and make use of it's compaction jobs but it looks like this isn't supported when writing from Spark yet.

Any advice would be greatly appreciated.

18,693

Who agreed with this topic

Explorer

Agreed date:

Welcome to the Cloudera Community

Who agreed with this topic

Stream data from Kafka to HDFS