Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

sqoop import to hive again stroing repeted recordes in same hive table

avatar

sqoop import --connect jdbc:mysql://locahost/test --table a2 --username root --password -m 1 --hive-import --hive-database default --hive-table a2 --target-dir /tmp/n11 --driver com.mysql.jdbc.Driver 1.

The mysql table a2 contains 2 records.

example

id name

1 aa

2 bb

2. Initially i run the below query it create and load the 2 records to the hive

3.Then i run the same query again it stores the same records like this

id name

1 aa

2 bb

1 aa

2 bb

how avoid this duplicate records generation in hive table using sqoop please suggest me

please help me to solve this problem

thanks in advance

swathi

1 ACCEPTED SOLUTION

avatar
Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
4 REPLIES 4

avatar
Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

Thanku so much

avatar
Contributor

Dear All.

I have table in sql server that column contain random unique number there is no any primary key but we want to perform incremental append or lastmodified operation using sqoop so please help me.




Note:-This is Critical Issue.

avatar
Contributor

Greg's answer applies to you as well for incremental import/export operation. Also if you have some column in your source table which is an sequential index etc then you can be used for --split-by clause for distribution of data per mapper to scale parallelism and reduce runtime of app.


My understanding is random numbers in a column if used for split key , can cause skew as well leading to different runtimes for map tasks.