Created 01-09-2018 08:42 AM
I am having one hive table with historical data and every day one new hive table will create with new data, I want load this data into historical Table with out overwriting older data.
Created 01-09-2018 08:51 AM
Do
Insert into table <historical-table-name> select * from <new-table-name>;
Insert into statement means we are going to append to the existing data.
If the table structure of both historical and new table is same then you can use select * from new table.
If the structure is not same you need to match the historical table structure when you do select from new table.
In addition if you want to overwrite the existing data then you just need to change overwrite instead of into and the statement as follows
Insert overwrite table <historical-table-name> select * from <new-table-name>;
Created 01-10-2018 08:31 AM
Hi @Shu,
Thanks for response.. I need to load hive table incrementally,in my source table batchid column is there, I tried --incremental append --check-cloumn batchid.. whn I run 2nd time records are getting double.How can I achieve my this with out duplicates.
sqoop import --connect "jdbc:sqlserver://10.21.29.15:1433;database=db;username=ReportingServices;password=ReportingServices" --check-column batchid --incremental append -m 1 --hive-table mmidwpresentation.journeypositions_archive --table JourneyPositions --hive-import -- --schema safedrive
Created 01-10-2018 03:10 PM
Create a sqoop job for your import as
sqoop job --create <job-name> -- import --connect "jdbc:sqlserver://10.21.29.15:1433;database=db;username=ReportingServices;password=ReportingServices" --check-column batchid --incremental append -m 1 --hive-table mmidwpresentation.journeypositions_archive --table JourneyPositions --hive-import --schema safedrive
So once you create sqoop job sqoop will store the last value for the batchid(it's check column argument), when ever you run the job again sqoop will pull only new records after the last state value.
Sqoop Job Arguments:-
$ sqoop job --create <job-name>Define a new saved job with the specified job-id (name). A second Sqoop comm and-lin e, separated by a -- should be specified; this defines the saved job. --delete <job-name>Delete a saved job. --exec <job-name>Given a job defined with --create, run the saved job. --show <job-name>Show the parameters for a saved job. --list
these are all the arguments you can use with sqoop job command to execute, list,delete jobs ..etc.
Use --password-file option to Set path for a file containing the authentication password while creating sqoop jobs
Created 01-09-2018 09:01 AM
Option 1 using hive queries.
Let us assume there are two tables A and B. You wan't to copy data from B to A then
insert into table A select * from B; // You can use this when both of them have same schema insert into table A select col1, col2, col3 from B; // You can use this when schemas are different and //you want to insert selected columns from table B
Option 2 using sqoop.
You can schedule a sqoop job to do this with incremental mode. Below links gives more info
https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports
Thanks,
Aditya
Created 01-10-2018 08:32 AM
HI @Aditya Sirna,
Thanks for response.. I need to load hive table incrementally,in my source table batchid column is there, I tried --incremental append --check-cloumn batchid.. whn I run 2nd time records are getting double.How can I achieve my this with out duplicates.
sqoop import --connect "jdbc:sqlserver://10.21.29.15:1433;database=db;username=ReportingServices;password=ReportingServices" --check-column batchid --incremental append -m 1 --hive-table mmidwpresentation.journeypositions_archive --table JourneyPositions --hive-import -- --schema safedrive
Created 01-10-2018 09:13 AM
Try passing '--last-value {last-batch-id}' also in the import command.
Assume data is like this
batchId col1 col2 col3 ---> col names 1 a b c -->records 2 a b c 3 a b c 4 d e f 5 d e f 6 d e f
If you pass "--check-column batchId --last-value 3" . It will import all rows after the mentioned batchId. ie the below records will be imported.
4 d e f 5 d e f 6 d e f
Created 01-10-2018 12:38 PM
Hi @Aditya Sirna,
I want to schedule this job so I can not give value like 3 or something else,dynamically it should take.
Created 01-10-2018 01:15 PM
You can use sqoop saved jobs to do this. This will be handled automatically.
http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_saved_jobs
Created 01-09-2018 10:19 AM
how to delete bing history completely Bing is the default search engine comes with Microsoft edge /window explorer and mostly everyone is aware of the bing as the general search engine used by many and known to all