Hi,
I'm doing the architecture of my project on HDP 2.6.3. The process of Data FLow is:
Filebeat --> Kafka (with a period of retention in Kafka is 7 days)
I'll use Spark Streaming for a real-time processing (micro-batch) and for the data standardization.
I'm hesitate which architecture I choice, because in HDP 3, Flume is not exist and HDP DataFlow will replace it.
Can you please give me your suggestion or remark about the impact or the influence on the Data process.
If I use:
Kafka --> Spark Streaming <---> HDFS
Which impact on ingestion of Data ?
If I use :
Kafka --> Flume --> HDFS <--> Spark Streaming
Which impact on ingestion of Data ?
Thank you