I'm newbie in Hadoop ecosystem. I wanna do a project where I stream some tweets to analyze them in Hive, all this process have to be done in HDF/NiFi. The project must be scalable. I saw here that people adopt two different flow strategist.
1.) Get the tweets ---> Put them into the HDFS ---> analyze with Hive
2.) Get the tweets ---> Stream with Kafka(publish/consumer) ---> Put them into the HDFS ---> Analyze with Hive
SO, my question is what's the difference? the first strategy isn't scalable? Which strategy would you follow? Thank you.
... View more