Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark Streaming or FLume which is better and impact

Spark Streaming or FLume which is better and impact

New Contributor

Hi,

I'm doing the architecture of my project on HDP 2.6.3. The process of Data FLow is:

Filebeat --> Kafka (with a period of retention in Kafka is 7 days)

I'll use Spark Streaming for a real-time processing (micro-batch) and for the data standardization.

I'm hesitate which architecture I choice, because in HDP 3, Flume is not exist and HDP DataFlow will replace it.

Can you please give me your suggestion or remark about the impact or the influence on the Data process.

If I use:

Kafka --> Spark Streaming <---> HDFS

Which impact on ingestion of Data ?

If I use :

Kafka --> Flume --> HDFS <--> Spark Streaming

Which impact on ingestion of Data ?

Thank you

1 REPLY 1
Highlighted

Re: Spark Streaming or FLume which is better and impact

@Sirine Flume is deprecated from HDP 3.0 - You should consider DataFlow as alternative.

https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/release-notes/content/removed_components.ht...

Kafka->spark->hdfs also seems a viable alternative.

HTH