Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Spark Streaming or FLume which is better and impact

Explorer

Hi,

I'm doing the architecture of my project on HDP 2.6.3. The process of Data FLow is:

Filebeat --> Kafka (with a period of retention in Kafka is 7 days)

I'll use Spark Streaming for a real-time processing (micro-batch) and for the data standardization.

I'm hesitate which architecture I choice, because in HDP 3, Flume is not exist and HDP DataFlow will replace it.

Can you please give me your suggestion or remark about the impact or the influence on the Data process.

If I use:

Kafka --> Spark Streaming <---> HDFS

Which impact on ingestion of Data ?

If I use :

Kafka --> Flume --> HDFS <--> Spark Streaming

Which impact on ingestion of Data ?

Thank you

1 REPLY 1

@Sirine Flume is deprecated from HDP 3.0 - You should consider DataFlow as alternative.

https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/release-notes/content/removed_components.ht...

Kafka->spark->hdfs also seems a viable alternative.

HTH

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.