Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can I use split by with multiple mappers in scoop

avatar
New Contributor

How to increase the performance of scoop import using split by and. Mappers

3 REPLIES 3

avatar

@Rashi Jain

To increase the performance of sqoop import increase the no of mappers depending on the source load and no of records which are ingested into HDFS. Also in split by try to use primary key through which you will be able to identify the unique records. So that the records will split into multiple mappers and the ingestion would work faster. Hope It Helps!!

avatar
New Contributor

@Bala Vignesh N V - Thanks for your reply.

When I am trying to perform Sqoop import with DB2:

sqoop import --connect "jdbc:db2://<connectionString>/<DBname>" --username <user> --password <pswd> --table <tablename> --split-by <primary key> --target-dir <dirPath> -m 10

Getting below exeception, but if i remove "-m 10" from the above command it is working absolutely fine with default 4 +2 mappers:

Error: java.io.IOException: SQLException in nextKeyValue at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Caused by: com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=駂;AT MICROSECONDS MICROSECOND SECONDS SECOND MINUTES MINUTE HOURS, DRIVER=4.19.66 at com.ibm.db2.jcc.am.kd.a(kd.java:747) at com.ibm.db2.jcc.am.kd.a(kd.java:66)

avatar

@Rush

You have to correct the sqoop syntax. Mapper parameter has to be specified before the target directory. Please correct the syntax and re-trigger the sqoop command. It should work fine. Hope if helps!!