Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Can I use split by with multiple mappers in scoop

Highlighted

Can I use split by with multiple mappers in scoop

New Contributor

How to increase the performance of scoop import using split by and. Mappers

3 REPLIES 3

Re: Can I use split by with multiple mappers in scoop

@Rashi Jain

To increase the performance of sqoop import increase the no of mappers depending on the source load and no of records which are ingested into HDFS. Also in split by try to use primary key through which you will be able to identify the unique records. So that the records will split into multiple mappers and the ingestion would work faster. Hope It Helps!!

Re: Can I use split by with multiple mappers in scoop

New Contributor

@Bala Vignesh N V - Thanks for your reply.

When I am trying to perform Sqoop import with DB2:

sqoop import --connect "jdbc:db2://<connectionString>/<DBname>" --username <user> --password <pswd> --table <tablename> --split-by <primary key> --target-dir <dirPath> -m 10

Getting below exeception, but if i remove "-m 10" from the above command it is working absolutely fine with default 4 +2 mappers:

Error: java.io.IOException: SQLException in nextKeyValue at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Caused by: com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=駂;AT MICROSECONDS MICROSECOND SECONDS SECOND MINUTES MINUTE HOURS, DRIVER=4.19.66 at com.ibm.db2.jcc.am.kd.a(kd.java:747) at com.ibm.db2.jcc.am.kd.a(kd.java:66)

Re: Can I use split by with multiple mappers in scoop

@Rush

You have to correct the sqoop syntax. Mapper parameter has to be specified before the target directory. Please correct the syntax and re-trigger the sqoop command. It should work fine. Hope if helps!!

Don't have an account?
Coming from Hortonworks? Activate your account here