how can we bring updates into hive without loading the complete table and checking for the differences ?the incremental append will only look for a date column for newly added records and bring those records only.we can always create a trigger on the source table to create an additional record on any updates but whats the solution without having to create a trigger ?
e.g lets say we have a table TEST (ID int , NAME string, LOGIN_TIME timesteamp)
we do incremental appends using LOGIN_TIME colum so all new records will get moved to hive
but what if someone modifies the NAME column ? how will sqoop pick it up ?
I would create a placeholder table on the new data as external with location in hdfs. Your procedure could drop additional files into that location. Subsequently an INSERT INTO query on that one would pull your (delta) updates in.