Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Spark Streaming or FLume which is better and impact



I'm doing the architecture of my project on HDP 2.6.3. The process of Data FLow is:

Filebeat --> Kafka (with a period of retention in Kafka is 7 days)

I'll use Spark Streaming for a real-time processing (micro-batch) and for the data standardization.

I'm hesitate which architecture I choice, because in HDP 3, Flume is not exist and HDP DataFlow will replace it.

Can you please give me your suggestion or remark about the impact or the influence on the Data process.

If I use:

Kafka --> Spark Streaming <---> HDFS

Which impact on ingestion of Data ?

If I use :

Kafka --> Flume --> HDFS <--> Spark Streaming

Which impact on ingestion of Data ?

Thank you


@Sirine Flume is deprecated from HDP 3.0 - You should consider DataFlow as alternative.

Kafka->spark->hdfs also seems a viable alternative.


Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.