Created 10-16-2017 02:45 AM
How to increase the performance of scoop import using split by and. Mappers
Created 10-16-2017 01:08 PM
To increase the performance of sqoop import increase the no of mappers depending on the source load and no of records which are ingested into HDFS. Also in split by try to use primary key through which you will be able to identify the unique records. So that the records will split into multiple mappers and the ingestion would work faster. Hope It Helps!!
Created 10-16-2017 03:47 PM
@Bala Vignesh N V - Thanks for your reply.
When I am trying to perform Sqoop import with DB2:
sqoop import --connect "jdbc:db2://<connectionString>/<DBname>" --username <user> --password <pswd> --table <tablename> --split-by <primary key> --target-dir <dirPath> -m 10
Getting below exeception, but if i remove "-m 10" from the above command it is working absolutely fine with default 4 +2 mappers:
Error: java.io.IOException: SQLException in nextKeyValue at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=駂;AT MICROSECONDS MICROSECOND SECONDS SECOND MINUTES MINUTE HOURS, DRIVER=4.19.66 at com.ibm.db2.jcc.am.kd.a(kd.java:747) at com.ibm.db2.jcc.am.kd.a(kd.java:66)
Created 10-17-2017 12:30 PM
You have to correct the sqoop syntax. Mapper parameter has to be specified before the target directory. Please correct the syntax and re-trigger the sqoop command. It should work fine. Hope if helps!!