Created on 03-23-2018 02:52 AM - edited 09-16-2022 06:01 AM
I'm importing 5.6 GB table
and the error:
Error: java.io.IOException: mysqldump terminated with status 2 at org.apache.sqoop.mapreduce.MySQLDumpMapper.map(MySQLDumpMapper.java:485) at org.apache.sqoop.mapreduce.MySQLDumpMapper.map(MySQLDumpMapper.java:49) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Created 03-23-2018 06:18 AM
So the issue is with the "PK" column used in distributing the data in case of multiple mappers. It has always been recommended that an "integral" column is used as the "split by" column and your import is trying to use the column "CustID" which is String.
Have a look at how your splits are calculated during the import.
8020 [main] WARN org.apache.sqoop.mapreduce.db.TextSplitter - You are strongly encouraged to choose an integral split column. 8025 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '`CustID` >= '1'' and upper bound '`CustID` < '3?????'' 8025 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '`CustID` >= '1'' and upper bound '`CustID` < '3?????'' 8025 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '`CustID` >= '3?????'' and upper bound '`CustID` < '5?????'' 8025 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '`CustID` >= '3?????'' and upper bound '`CustID` < '5?????'' 8025 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '`CustID` >= '5?????'' and upper bound '`CustID` < '7*?????'' 8025 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '`CustID` >= '5?????'' and upper bound '`CustID` < '7*?????'' 8025 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '`CustID` >= '7*?????'' and upper bound '`CustID` <= '999999'' 8025 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '`CustID` >= '7*?????'' and upper bound '`CustID` <= '999999'' 8068 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:4
The "?" indicates some foreign characters probably not parsed properly and hence resulted in the failure of your tasks.
However, when you have only a single mapper, there is no such parse needed for CustID column and the data is "copied and pasted" to HDFS and your job ends OK.
Created 03-23-2018 10:00 PM
Okey, Thanks a lot for explaining it to me very well. . I really have a lot of things needed to learn for this job.