Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

No such file or directory and directory already exists error in sqoop

No such file or directory and directory already exists error in sqoop

Expert Contributor

I have my sqoop jobs typically written like this:

sqoop job --meta-connect jdbc:hsqldb:hsql://IP:16000/sqoop --create mto_trackers -- import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://FQDN/db?zeroDateTimeBehavior=convertToNull --username root --password 'pass' --table MTO --merge-key id --split-by id --hive-import --hive-overwrite  --hive-database Erp --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N' --fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' --input-null-non-string '\\N' --input-fields-terminated-by '\001'

 

I execute them from oozie:

job --meta-connect jdbc:hsqldb:hsql://IP:16000/sqoop --exec design_campaign -- --warehouse-dir ERP/Snapshots/${DATE}

 

The date is passed with minutes and seconds so that it should never throw output directory already exists error. The directory gets created at the time the job is executed. Data is being stored in hive only and warehouse dir is only transient.

From past couple of days, 

I have started getting:

Fetching child yarn jobs
tag id : oozie-83cce1910b064d488ee8238942434fba
2018-05-22 18:15:32,969 [main] INFO  org.apache.hadoop.yarn.client.RMProxy  - Connecting to ResourceManager at ip-172-31-4-192.ap-south-1.compute.internal/172.31.4.192:8032
Child yarn jobs are found - application_1526987357949_0572

Found [1] Map-Reduce jobs from this launcher
Killing existing jobs and starting over:
2018-05-22 18:15:33,795 [main] INFO  org.apache.hadoop.yarn.client.RMProxy  - Connecting to ResourceManager at ip-172-31-4-192.ap-south-1.compute.internal/172.31.4.192:8032
Killing job [application_1526987357949_0572] ... 2018-05-22 18:15:34,069 [main] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl  - Killed application application_1526987357949_0572
Done

=================================================================

>>> Invoking Sqoop command line now >>>

2018-05-22 18:15:34,236 [main] WARN  org.apache.sqoop.tool.SqoopTool  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2018-05-22 18:15:34,452 [main] INFO  org.apache.sqoop.Sqoop  - Running Sqoop version: 1.4.6-cdh5.10.1
2018-05-22 18:15:35,328 [main] WARN  org.apache.sqoop.ConnFactory  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2018-05-22 18:15:35,411 [main] WARN  org.apache.sqoop.ConnFactory  - Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
2018-05-22 18:15:35,435 [main] INFO  org.apache.sqoop.manager.SqlManager  - Using default fetchSize of 1000
2018-05-22 18:15:35,461 [main] INFO  org.apache.sqoop.tool.CodeGenTool  - Beginning code generation
2018-05-22 18:15:36,226 [main] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM customer_cancel AS t WHERE 1=0
2018-05-22 18:15:36,274 [main] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM customer_cancel AS t WHERE 1=0
2018-05-22 18:15:36,331 [main] INFO  org.apache.sqoop.orm.CompilationManager  - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/lib/hadoop-mapreduce
2018-05-22 18:15:42,128 [main] INFO  org.apache.sqoop.orm.CompilationManager  - Writing jar file: /tmp/sqoop-yarn/compile/13a780bf808eee713a33774952bb182b/customer_cancel.jar
2018-05-22 18:15:42,163 [main] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Beginning import of customer_cancel
2018-05-22 18:15:42,164 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2018-05-22 18:15:42,226 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2018-05-22 18:15:42,229 [main] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM customer_cancel AS t WHERE 1=0
2018-05-22 18:15:42,293 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2018-05-22 18:15:42,296 [main] WARN  org.apache.sqoop.mapreduce.JobBase  - SQOOP_HOME is unset. May not be able to find all job dependencies.
2018-05-22 18:15:42,450 [main] INFO  org.apache.hadoop.yarn.client.RMProxy  - Connecting to ResourceManager at ip-172-31-4-192.ap-south-1.compute.internal/172.31.4.192:8032
2018-05-22 18:15:42,914 [main] WARN  org.apache.hadoop.security.UserGroupInformation  - PriviledgedActionException as:hue (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://ip-172-31-4-192.ap-south-1.compute.internal:8020/user/hue/ERP/Snapshots/2018-05-22--17-31/cus... already exists
2018-05-22 18:15:42,914 [main] ERROR org.apache.sqoop.tool.ImportTool  - Import failed: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://ip-172-31-4-192.ap-south-1.compute.internal:8020/user/hue/ERP/Snapshots/2018-05-22--17-31/cus... already exists
	at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
	at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:270)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325)
	at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:203)
	at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:176)
	at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:273)
	at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:513)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
	at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:243)
	at org.apache.sqoop.tool.JobTool.run(JobTool.java:298)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
	at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:196)
	at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:179)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:60)
	at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:48)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:234)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
<<< Invocation of Sqoop command completed <<<

No child hadoop job is executed.
Intercepting System.exit(1)

<<< Invocation of Main class completed <<<

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]

Oozie Launcher failed, finishing Hadoop job gracefully

Oozie Launcher, uploading action data to HDFS sequence file: hdfs://ip-172-31-4-192.ap-south-1.compute.internal:8020/user/hue/oozie-oozi/0000013-180522164019872-...
2018-05-22 18:15:43,119 [main] INFO  org.apache.hadoop.io.compress.zlib.ZlibFactory  - Successfully loaded & initialized native-zlib library
2018-05-22 18:15:43,120 [main] INFO  org.apache.hadoop.io.compress.CodecPool  - Got brand-new compressor [.deflate]
Successfully reset security manager from org.apache.oozie.action.hadoop.LauncherSecurityManager@1e0c006f to null

Oozie Launcher ends

2018-05-22 18:15:43,714 [main] INFO  org.apache.hadoop.mapred.Task  - Task:attempt_1526987357949_0566_m_000000_1 is done. And is in the process of committing
2018-05-22 18:15:43,714 [main] INFO  org.apache.hadoop.mapred.Task  - Task:attempt_1526987357949_0566_m_000000_1 is done. And is in the process of committing
2018-05-22 18:15:44,180 [main] INFO  org.apache.hadoop.mapred.Task  - Task 'attempt_1526987357949_0566_m_000000_1' done.
2018-05-22 18:15:44,180 [main] INFO  org.apache.hadoop.mapred.Task  - Task 'attempt_1526987357949_0566_m_000000_1' done.
2018-05-22 18:15:44,289 [main] INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl  - Stopping MapTask metrics system...
2018-05-22 18:15:44,290 [main] INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl  - MapTask metrics system stopped.
2018-05-22 18:15:44,290 [main] INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl  - MapTask metrics system shutdown complete.

 

How do I fix this and why does it happen? It happens very randomly, happens once and not the next time and happens to any of the sqoop jobs scheduled.

 

Apart from this, One of my other jobs got killed with this error:

 

  Log Upload Time: Tue May 22 18:49:22 +0530 2018
          
            Log Length: 327
          error: error reading/tmp/sqoop-yarn/compile/923e1d3c889eccef0d517d2f4308697b/QueryResult.java
1 error
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]

 

Some other times it is not able to find the sqoop_job-NAME.jar file in /tmp/sqoop-yarn/ folder. 

I am not able to find what exactly could be causing this? This has nothing to do with my sqoop job actually but something within the cluster that is not working as it should. I am not able to find errors in any of my server logs. 

Could I please get some help on this? I am sure it points out something that these two things started to happen together. Would be really great if I can get some idea or direction on this.