Member since
06-27-2016
9
Posts
3
Kudos Received
0
Solutions
07-07-2016
02:28 PM
Thank you for your answers, that really helps. Im a bit further now Right now:
A croned python script on the NameNode writes the kafka stream every 5 min to hdfs. (External Table JSON).
Every hour another script which executes a "insert overwrite" moves the data from the external table to an orc partitioned and clustered table.
This table should be the BI Table for realtime Analysis.
My next plan would be to change the 1. script to directly update/insert the hive table, so that i can eleminate the second script.
Thanks for any suggestions.
... View more