Support Questions

rashij_27 · ‎10-16-2017

How to increase the performance of scoop import using split by and. Mappers

balavignesh_nag · ‎10-16-2017

To increase the performance of sqoop import increase the no of mappers depending on the source load and no of records which are ingested into HDFS. Also in split by try to use primary key through which you will be able to identify the unique records. So that the records will split into multiple mappers and the ingestion would work faster. Hope It Helps!!

rashij_27 · ‎10-16-2017

@Bala Vignesh N V - Thanks for your reply.

When I am trying to perform Sqoop import with DB2:

sqoop import --connect "jdbc:db2://<connectionString>/<DBname>" --username <user> --password <pswd> --table <tablename> --split-by <primary key> --target-dir <dirPath> -m 10

Getting below exeception, but if i remove "-m 10" from the above command it is working absolutely fine with default 4 +2 mappers:

Error: java.io.IOException: SQLException in nextKeyValue at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Caused by: com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=駂;AT MICROSECONDS MICROSECOND SECONDS SECOND MINUTES MINUTE HOURS, DRIVER=4.19.66 at com.ibm.db2.jcc.am.kd.a(kd.java:747) at com.ibm.db2.jcc.am.kd.a(kd.java:66)

balavignesh_nag · ‎10-17-2017

@Rush

You have to correct the sqoop syntax. Mapper parameter has to be specified before the target directory. Please correct the syntax and re-trigger the sqoop command. It should work fine. Hope if helps!!

Cloudera Community

Support Questions

Can I use split by with multiple mappers in scoop

How to split large json file into multiple json fi...

Sqoop Import, Why do I need create view access to ...

Split CSV between Multiple Records in Apache NIFI

split GPU for multiple users

Splitting a Nifi flowfile into multiple flowfiles

How Region Split works in HBase.

Performance Delays in Namenode Caused by Multiple ...

XML Processing: Encoding, Validation, Parsing & Sp...

how to split and batch insert

How Hive determines the number of splits