Created 07-18-2016 05:52 PM
Hi,
I am trying to import data from mysql to HDFS using sqoop:: The command is as below::
sqoop import --connect jdbc:mysql://192.168.218.128/sqoopdb -username hadoop --table EMP_ADD --driver com.mysql.jdbc.Driver --m 1 --where "CITY='sec-bad'" --target-dir /Practice/SqoopToHDFSWhere
Post checking the respective generated files in HDFS, getting the data duplicated.
[hdfs@sandbox root]$ hadoop fs -cat /Practice/SqoopToHDFSWhere/part-m-00000
1202,108I,aoc,sec-bad
1204,78B,old city,sec-bad
1205,720X,hitec,sec-bad
1202,108I,aoc,sec-bad
1204,78B,old city,sec-bad
1205,720X,hitec,sec-bad
Please help me on this..
PS:- I am using HDP2.4
Regards,
Suresh Kumar
Created 07-18-2016 06:14 PM
try this
sqoop import --connect jdbc:mysql://192.168.218.128/sqoopdb --driver "com.teradata.jdbc.TeraDriver" --username hadoop --password Hadoop@1 --query "select * from emp_add where city='sec-bad' AND \$CONDITIONS" --target-dir /Practice/SqoopToHDFSWhere/ --m 1;
Created 07-18-2016 06:14 PM
try this
sqoop import --connect jdbc:mysql://192.168.218.128/sqoopdb --driver "com.teradata.jdbc.TeraDriver" --username hadoop --password Hadoop@1 --query "select * from emp_add where city='sec-bad' AND \$CONDITIONS" --target-dir /Practice/SqoopToHDFSWhere/ --m 1;
Created 07-19-2016 07:32 AM
Hi Divakar,
When I tried above getting a new error::
16/07/19 07:22:58 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.teradata.jdbc.TeraDriver java.lang.RuntimeException: Could not load db driver class: com.teradata.jdbc.TeraDriver at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:856) at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:744) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:767) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241).....
I just want to know, whenever I run command once, data is getting inserted twice. any settings/configurations needs to be changed.
PFAsqoopduplication.jpg complete execution output::
as per MapReduce output it is retrieving 6 records. Please suggest.
Created 07-19-2016 11:16 AM
Can you confirm if your mysql query is not resulting duplicates i.e. "select * from emp_add where city='sec-bad'"
Created 07-19-2016 06:48 PM
Typo in sqoop command use mysql driver instead of using Teradata driver.
Here is modified script:
sqoop import --connect jdbc:mysql://192.168.218.128/sqoopdb --driver com.mysql.jdbc.Driver --username hadoop --password Hadoop@1 --query "select * from emp_add where city='sec-bad' AND \$CONDITIONS" --target-dir /Practice/SqoopToHDFSWhere/ --m 1;