Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to get updates into hive

how to get updates into hive

Master Collaborator

how can we bring updates into hive without loading the complete table and checking for the differences ?the incremental append will only look for a date column for newly added records and bring those records only.we can always create a trigger on the source table to create an additional record on any updates but whats the solution without having to create a trigger ?

e.g lets say we have a table TEST (ID int , NAME string, LOGIN_TIME timesteamp) we do incremental appends using LOGIN_TIME colum so all new records will get moved to hive but what if someone modifies the NAME column ? how will sqoop pick it up ?

3 REPLIES 3

Re: how to get updates into hive

Master Collaborator

can someone please advise ?

Re: how to get updates into hive

New Contributor

I would create a placeholder table on the new data as external with location in hdfs. Your procedure could drop additional files into that location. Subsequently an INSERT INTO query on that one would pull your (delta) updates in.

Re: how to get updates into hive

Mentor

@Samuel Peeters

I think what you are looking for is unmanaged updates that captures the changes in your source database/table that's replicated in near-real-time to hive.

The solution is similar to this a combination of Kafka /flume quite simple but effective setup

Kafka-to-hive-streaming

Read through and adjust it to your environment