Member since
07-05-2017
7
Posts
3
Kudos Received
0
Solutions
04-20-2021
07:33 PM
Thanks for such a nice and detailed blog. I am looking for a solution to avoid duplicate records during hive streaming. Can anybody please help me ?
... View more
07-06-2017
02:11 AM
Thanks. Overall What I understood running such example described here https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest can't run outside of edge node until unless that machine is part of HDP network. We are working with Akka Streaming (http://doc.akka.io/docs/akka/snapshot/scala/stream/index.html) to get data from Kafka and sink with Hive using (HCatalog Streaming API) , so It can scale horizontally with multiple pods (docker). That will allow us to scale on demand. So ideally we would be running a program similar to what is written here https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest outside of edge node.
... View more