Support Questions

Find answers, ask questions, and share your expertise

incremental load from mysql to hdfs in hadoop

avatar
New Contributor

From mysql to hdfs directory.

sqoop import --connect jdbc:mysql://localhost/hadoopdb --username smas --password MyNewPass --table emp1 -m 1 --target-dir /data_new7 --incremental append --check-column id -last-value 2

i have /date_new7/part-m-00000 also it didnot work ?

13761-1.png

13762-2.png

13763-3.png

how to make sure that part-m-00000 is updated with 3rd row or id .

it is updating as a seperate table ? any suggestion ?

1 ACCEPTED SOLUTION

avatar
Master Guru

It worked. part-m-00001 is not a separate table, it's just another file in your import directory. If you create an external table on /date_new7, Hive will see a single table with 3 rows. Ditto for Map-reduce jobs taking /date_new7 as their input. If you end up with many small files you can merge them into one (from time to time) by using for example hadoop-streaming, see this example and set "mapreduce.job.reduces=1".

View solution in original post

1 REPLY 1

avatar
Master Guru

It worked. part-m-00001 is not a separate table, it's just another file in your import directory. If you create an external table on /date_new7, Hive will see a single table with 3 rows. Ditto for Map-reduce jobs taking /date_new7 as their input. If you end up with many small files you can merge them into one (from time to time) by using for example hadoop-streaming, see this example and set "mapreduce.job.reduces=1".