Member since
06-27-2016
9
Posts
3
Kudos Received
0
Solutions
07-07-2016
02:28 PM
Thank you for your answers, that really helps. Im a bit further now Right now:
A croned python script on the NameNode writes the kafka stream every 5 min to hdfs. (External Table JSON).
Every hour another script which executes a "insert overwrite" moves the data from the external table to an orc partitioned and clustered table.
This table should be the BI Table for realtime Analysis.
My next plan would be to change the 1. script to directly update/insert the hive table, so that i can eleminate the second script.
Thanks for any suggestions.
... View more
06-27-2016
08:17 PM
1 Kudo
hi, we like to implement a pretty basic stream datapipeline to our hadoop cluster. App events are already send to a Kafka Topic.
The perfect solution would be to stream the data (Json) directly to a HIVE table. So that the BI Team can do its analysises nearly in realtime on those information. I researched a bit but did not found any "Best practice" solution for that case.
We use Hortonworks Hadoop HDP with the "Basic" Techstack such as Flume,Spark... Here my questions: - what is the best practise for an event stream to BI? - Is there an example which fits that case?
Thanks in advance
KF2
... View more