Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Question - Is there any way to skip checking for jobHistory server to avoid Oozie job failures

Solved Go to solution
Highlighted

Question - Is there any way to skip checking for jobHistory server to avoid Oozie job failures

Super Guru

Is there any way to skip checking for jobHistory server to avoid Oozie job failures. Sometimes when JobHistory server is down or getting restarted, Oozie job gets failed with an error like "Unknown job id" error ( When it fails to connect to JobHistory server, which is expected )

Any idea how can we workaround on this error? Any timeout parameter?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Question - Is there any way to skip checking for jobHistory server to avoid Oozie job failures

Super Guru

I did the bit of research and looked into the code and found that currently there is no TIMEOUT parameter on Oozie level. I have raised an internal enhancement request for this.

##Snipped from JavaActionExecutor.java##

try { 
Element actionXml = XmlUtils.parseXml(action.getConf()); 
FileSystem actionFs = context.getAppFileSystem(); 
JobConf jobConf = createBaseHadoopConf(context, actionXml); 
jobClient = createJobClient(context, jobConf); 
RunningJob runningJob = getRunningJob(context, action, jobClient); 
if (runningJob == null) { 
context.setExecutionData(FAILED, null); 
throw new ActionExecutorException(ActionExecutorException.ErrorType.FAILED, "JA017", 
"Unknown hadoop job [{0}] associated with action [{1}]. Failing this action!", action 
.getExternalId(), action.getId()); 
} 
protected RunningJob getRunningJob(Context context, WorkflowAction action, JobClient jobClient) throws Exception{ 
RunningJob runningJob = jobClient.getJob(JobID.forName(action.getExternalId())); 
return runningJob; 
} 

##Snippet from Mapreduce code(JobClient.java)##

public RunningJob getJob(JobID jobid) throws IOException { 
JobStatus status = jobSubmitClient.getJobStatus(jobid); 
JobProfile profile = jobSubmitClient.getJobProfile(jobid); 
if (status != null && profile != null) { 
return new NetworkedJob(status, profile, jobSubmitClient); 
} else { 
return null; 
} 
} 

##Snippet from JobSubmissionProtocol.java (mapreduce code)##

/** 
* Grab a handle to a job that is already known to the JobTracker. 
* @return Status of the job, or null if not found. 
*/ 
public JobStatus getJobStatus(JobID jobid) throws IOException; 

So I got answer to my question! :)

6 REPLIES 6

Re: Question - Is there any way to skip checking for jobHistory server to avoid Oozie job failures

Super Guru

Re: Question - Is there any way to skip checking for jobHistory server to avoid Oozie job failures

Super Guru

Re: Question - Is there any way to skip checking for jobHistory server to avoid Oozie job failures

Super Guru

Re: Question - Is there any way to skip checking for jobHistory server to avoid Oozie job failures

Maybe you try troubleshooting your Job History server instead. It's not supposed to be down. A busy Job History server requires a reasonable amount of memory, like 8G or more.

Re: Question - Is there any way to skip checking for jobHistory server to avoid Oozie job failures

Super Guru

@Predrag Minovic - Yes that's correct. but if JHS is being restarted for some reason and Oozie tries to connect to JHS, jobs will get failed. I'm looking for timeout parameter which can hold jobs until JHS is back

Re: Question - Is there any way to skip checking for jobHistory server to avoid Oozie job failures

Super Guru

I did the bit of research and looked into the code and found that currently there is no TIMEOUT parameter on Oozie level. I have raised an internal enhancement request for this.

##Snipped from JavaActionExecutor.java##

try { 
Element actionXml = XmlUtils.parseXml(action.getConf()); 
FileSystem actionFs = context.getAppFileSystem(); 
JobConf jobConf = createBaseHadoopConf(context, actionXml); 
jobClient = createJobClient(context, jobConf); 
RunningJob runningJob = getRunningJob(context, action, jobClient); 
if (runningJob == null) { 
context.setExecutionData(FAILED, null); 
throw new ActionExecutorException(ActionExecutorException.ErrorType.FAILED, "JA017", 
"Unknown hadoop job [{0}] associated with action [{1}]. Failing this action!", action 
.getExternalId(), action.getId()); 
} 
protected RunningJob getRunningJob(Context context, WorkflowAction action, JobClient jobClient) throws Exception{ 
RunningJob runningJob = jobClient.getJob(JobID.forName(action.getExternalId())); 
return runningJob; 
} 

##Snippet from Mapreduce code(JobClient.java)##

public RunningJob getJob(JobID jobid) throws IOException { 
JobStatus status = jobSubmitClient.getJobStatus(jobid); 
JobProfile profile = jobSubmitClient.getJobProfile(jobid); 
if (status != null && profile != null) { 
return new NetworkedJob(status, profile, jobSubmitClient); 
} else { 
return null; 
} 
} 

##Snippet from JobSubmissionProtocol.java (mapreduce code)##

/** 
* Grab a handle to a job that is already known to the JobTracker. 
* @return Status of the job, or null if not found. 
*/ 
public JobStatus getJobStatus(JobID jobid) throws IOException; 

So I got answer to my question! :)

Don't have an account?
Coming from Hortonworks? Activate your account here