Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to load data into hive table with existing table without overwriting existing data?

avatar
Rising Star

I am having one hive table with historical data and every day one new hive table will create with new data, I want load this data into historical Table with out overwriting older data.

9 REPLIES 9

avatar
Master Guru

@Ravikiran Dasari,

Do

Insert into table <historical-table-name> select * from <new-table-name>;

Insert into statement means we are going to append to the existing data.

If the table structure of both historical and new table is same then you can use select * from new table.

If the structure is not same you need to match the historical table structure when you do select from new table.

In addition if you want to overwrite the existing data then you just need to change overwrite instead of into and the statement as follows

Insert overwrite table <historical-table-name> select * from <new-table-name>;

avatar
Rising Star

Hi @Shu,

Thanks for response.. I need to load hive table incrementally,in my source table batchid column is there, I tried --incremental append --check-cloumn batchid.. whn I run 2nd time records are getting double.How can I achieve my this with out duplicates.

sqoop import --connect "jdbc:sqlserver://10.21.29.15:1433;database=db;username=ReportingServices;password=ReportingServices" --check-column batchid --incremental append -m 1 --hive-table mmidwpresentation.journeypositions_archive --table JourneyPositions --hive-import -- --schema safedrive

avatar
Master Guru
@Ravikiran Dasari

Create a sqoop job for your import as

sqoop job --create <job-name> -- import --connect "jdbc:sqlserver://10.21.29.15:1433;database=db;username=ReportingServices;password=ReportingServices" --check-column batchid --incremental append -m 1 --hive-table mmidwpresentation.journeypositions_archive --table JourneyPositions --hive-import --schema safedrive

So once you create sqoop job sqoop will store the last value for the batchid(it's check column argument), when ever you run the job again sqoop will pull only new records after the last state value.

Sqoop Job Arguments:-

$ sqoop job 
		--create <job-name>Define a new saved job with the specified job-id (name). A second Sqoop comm		and-lin	e, separated by a -- should be specified; this defines the saved job.
		--delete <job-name>Delete a saved job.
		--exec <job-name>Given a job defined with --create, run the saved job.
		--show <job-name>Show the parameters for a saved job.
		--list

these are all the arguments you can use with sqoop job command to execute, list,delete jobs ..etc.

Use --password-file option to Set path for a file containing the authentication password while creating sqoop jobs

avatar
Super Guru

@Ravikiran Dasari,

Option 1 using hive queries.

Let us assume there are two tables A and B. You wan't to copy data from B to A then

insert into table A select * from B; // You can use this when both of them have same schema
insert into table A select col1, col2, col3 from B; // You can use this when schemas are different and //you want to insert selected columns from table B

Option 2 using sqoop.

You can schedule a sqoop job to do this with incremental mode. Below links gives more info

https://community.hortonworks.com/questions/10710/sqoop-incremental-import-working-fine-now-i-want-k...

https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports

Thanks,

Aditya

avatar
Rising Star

HI @Aditya Sirna,

Thanks for response.. I need to load hive table incrementally,in my source table batchid column is there, I tried --incremental append --check-cloumn batchid.. whn I run 2nd time records are getting double.How can I achieve my this with out duplicates.

sqoop import --connect "jdbc:sqlserver://10.21.29.15:1433;database=db;username=ReportingServices;password=ReportingServices" --check-column batchid --incremental append -m 1 --hive-table mmidwpresentation.journeypositions_archive --table JourneyPositions --hive-import -- --schema safedrive

avatar
Super Guru

@Ravikiran Dasari,

Try passing '--last-value {last-batch-id}' also in the import command.

Assume data is like this

batchId col1 col2 col3 ---> col names
1   a b c -->records
2   a b c
3   a b c
4   d e f
5   d e f
6   d e f

If you pass "--check-column batchId --last-value 3" . It will import all rows after the mentioned batchId. ie the below records will be imported.

4   d e f
5   d e f
6   d e f

avatar
Rising Star

Hi @Aditya Sirna,

I want to schedule this job so I can not give value like 3 or something else,dynamically it should take.

avatar
Super Guru

@Ravikiran Dasari,

You can use sqoop saved jobs to do this. This will be handled automatically.

http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_saved_jobs

avatar

how to delete bing history completely Bing is the default search engine comes with Microsoft edge /window explorer and mostly everyone is aware of the bing as the general search engine used by many and known to all