Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Killing existing jobs and starting over sqoop and oozie

Killing existing jobs and starting over sqoop and oozie

Sqoop command arguments :job--meta-connectjdbc:hsqldb:hsql://IP:16000/sqoop--execprice_range----warehouse-dirfolder/transit/2018-04-16--11-48Fetching child yarn jobstag id : oozie-e678030f4db3e129377fc1efdcc34e9a2018-04-1611:49:36,693[main] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at ip-172-31-4-192.ap-south-1.compute.internal/172.31.4.192:8032Child yarn jobs are found - application_1519975798846_265571Found[1]Map-Reduce jobs fromthis launcherKilling existing jobs and starting over:2018-04-1611:49:37,314[main] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at ip-172-31-4-192.ap-south-1.compute.internal/172.31.4.192:8032Killing job [application_1519975798846_265571]...2018-04-1611:49:37,334[main] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl-Killed application application_1519975798846_265571Done

This is what my typical sqoop job looks like:

sqoop job -Dmapred.reduce.tasks=3--meta-connect jdbc:hsqldb:hsql://IP:16000/sqoop --create job_name -- import --driver com.mysql.jdbc.Driver --connect 'jdbc:mysql://ip2/erp?zeroDateTimeBehavior=convertToNull&serverTimezone=IST' --username username --password 'PASS' --table orders --merge-key order_num --split-by order_num --hive-import --hive-overwrite --hive-database Erp --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N' --fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' --input-null-non-string '\\N' --input-fields-terminated-by '\001' --m 12

This is how I execute my jobs in oozie

  1. job --meta-connect jdbc:hsqldb:hsql://ip:16000/sqoop --exec JOB_NAME-- --warehouse-dir folder/transit/${DATE}

Now, I recently started getting an error: output directory already exists no matter what timestamp I pass in $DATE variable. It randomly gives this out on any sqoop job in oozie.

I add --warehouse-dir folder/Snapshots/${DATE} while executing job so that I DON'T GET output directory already exists ever but I started getting this yesterday out of nowhere. Currently, I do not see any flags about services acting up.

This err message makes it pretty intuitive that it is happening since warehouse dir gets created before it attempts to restart the job, however, the whole purpose of using warehouse-dir was to create a transitional directory so that I won't get this error. How do I fix this?

  1. Found[1]Map-Reduce jobs fromthis launcher
  1. Killing existing jobs and starting over:
Don't have an account?
Coming from Hortonworks? Activate your account here