- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Can I use split by with multiple mappers in scoop
Created 10-16-2017 02:45 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How to increase the performance of scoop import using split by and. Mappers
Created 10-16-2017 01:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To increase the performance of sqoop import increase the no of mappers depending on the source load and no of records which are ingested into HDFS. Also in split by try to use primary key through which you will be able to identify the unique records. So that the records will split into multiple mappers and the ingestion would work faster. Hope It Helps!!
Created 10-16-2017 03:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Bala Vignesh N V - Thanks for your reply.
When I am trying to perform Sqoop import with DB2:
sqoop import --connect "jdbc:db2://<connectionString>/<DBname>" --username <user> --password <pswd> --table <tablename> --split-by <primary key> --target-dir <dirPath> -m 10
Getting below exeception, but if i remove "-m 10" from the above command it is working absolutely fine with default 4 +2 mappers:
Error: java.io.IOException: SQLException in nextKeyValue at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=駂;AT MICROSECONDS MICROSECOND SECONDS SECOND MINUTES MINUTE HOURS, DRIVER=4.19.66 at com.ibm.db2.jcc.am.kd.a(kd.java:747) at com.ibm.db2.jcc.am.kd.a(kd.java:66)
Created 10-17-2017 12:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You have to correct the sqoop syntax. Mapper parameter has to be specified before the target directory. Please correct the syntax and re-trigger the sqoop command. It should work fine. Hope if helps!!