how can we bring updates into hive without loading the complete table and checking for the differences ?the incremental append will only look for a date column for newly added records and bring those records only.we can always create a trigger on the source table to create an additional record on any updates but whats the solution without having to create a trigger ?
e.g lets say we have a table TEST (ID int , NAME string, LOGIN_TIME timesteamp) we do incremental appends using LOGIN_TIME colum so all new records will get moved to hive but what if someone modifies the NAME column ? how will sqoop pick it up ?
I would create a placeholder table on the new data as external with location in hdfs. Your procedure could drop additional files into that location. Subsequently an INSERT INTO query on that one would pull your (delta) updates in.
I think what you are looking for is unmanaged updates that captures the changes in your source database/table that's replicated in near-real-time to hive.
The solution is similar to this a combination of Kafka /flume quite simple but effective setup
Read through and adjust it to your environment