Support Questions

Find answers, ask questions, and share your expertise

Jobtracker HA and oozie

avatar
Rising Star

Hello, I have a hadoop cluster with CDH 4.4.0 and HDFS HA and JobTracker HA.

I tried to lunch oozie workflow but got this:

 

JA006: Call From xxxx to xxxx failed on connection exception: java.net.ConnectException: Connection refused; For more details see:http://wiki.apache.org/hadoop/ConnectionRefused

 

I found this: https://issues.cloudera.org/browse/HUE-1631

 

But I don't know how to fix my problem. Where to configure oozie to work with JobTracker HA?

 

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hey Markovich,

 

Are you running the workflow from inside Hue or from the Oozie command line?  If from within Hue, then you are running into HUE-1631 and it won't be fixed until a later release of CDH.  If you are not using Hue, can you attach your job.properties file and workflow?

 

Thanks
Chris

View solution in original post

11 REPLIES 11

avatar
Super Collaborator

Hey Markovich,

 

Are you running the workflow from inside Hue or from the Oozie command line?  If from within Hue, then you are running into HUE-1631 and it won't be fixed until a later release of CDH.  If you are not using Hue, can you attach your job.properties file and workflow?

 

Thanks
Chris

avatar
Rising Star

Hi, Chris!

 

I am running the workflow from inside HUE.

Thanks, for reply.

 

May be you can also help me with this question (http://community.cloudera.com/t5/Batch-Processing-and-Workflow/How-to-limit-jobcache-foledr-size/m-p... ?

 

Thanks,

Andrey

avatar
Super Collaborator

Hey Markovich, 

 

I definitely think you are running into that bug then.  It's worth noting that it should only fail on existing workflows that point to the wrong JT.  If you create a new workflow it should work until the next JT failover.  Do new workflows work for you?

 

I will take a look at your other question as well.

 

Thanks

Chris

avatar
Rising Star

Hey Chris,

 

Yes, new workflows work until JT failover. I am runnning my jobs from command line now.

I have another strange thing.

Here is output of my MapReduce Job:

 

13/10/25 18:00:46 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/10/25 18:00:56 INFO input.FileInputFormat: Total input paths to process : 12451
13/10/25 18:01:06 INFO mapred.JobClient: Running job: job_201310251009_0004
13/10/25 18:01:07 INFO mapred.JobClient:  map 0% reduce 0%
13/10/25 18:03:48 INFO mapred.JobClient:  map 1% reduce 0%
13/10/25 18:08:25 INFO mapred.JobClient:  map 2% reduce 0%
13/10/25 18:12:58 INFO mapred.JobClient:  map 3% reduce 0%
13/10/25 18:17:21 INFO mapred.JobClient:  map 4% reduce 0%
13/10/25 18:21:53 INFO mapred.JobClient:  map 5% reduce 0%
13/10/25 18:26:26 INFO mapred.JobClient:  map 6% reduce 0%
13/10/25 18:30:59 INFO mapred.JobClient:  map 7% reduce 0%
13/10/25 18:35:26 INFO mapred.JobClient:  map 8% reduce 0%
13/10/25 18:39:51 INFO mapred.JobClient:  map 9% reduce 0%
13/10/25 18:44:29 INFO mapred.JobClient:  map 10% reduce 0%
13/10/25 18:48:53 INFO mapred.JobClient:  map 11% reduce 0%
13/10/25 18:53:30 INFO mapred.JobClient:  map 12% reduce 0%
13/10/25 18:57:52 INFO mapred.JobClient:  map 13% reduce 0%
13/10/25 19:02:11 INFO mapred.JobClient:  map 14% reduce 0%
13/10/25 19:06:32 INFO mapred.JobClient:  map 15% reduce 0%
13/10/25 19:10:59 INFO mapred.JobClient:  map 16% reduce 0%
13/10/25 19:15:19 INFO mapred.JobClient:  map 17% reduce 0%
13/10/25 19:19:38 INFO mapred.JobClient:  map 18% reduce 0%
13/10/25 19:23:55 INFO mapred.JobClient:  map 19% reduce 0%
13/10/25 19:28:27 INFO mapred.JobClient:  map 20% reduce 0%
13/10/25 19:32:46 INFO mapred.JobClient:  map 21% reduce 0%
13/10/25 19:37:16 INFO mapred.JobClient:  map 22% reduce 0%
13/10/25 19:41:44 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10. Trying to fail over immediately.
13/10/25 19:41:44 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 1 fail over attempts. Trying to fail over after sleeping for 1200ms.
13/10/25 19:41:45 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 2 fail over attempts. Trying to fail over after sleeping for 1403ms.
13/10/25 19:41:47 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 3 fail over attempts. Trying to fail over after sleeping for 2140ms.
13/10/25 19:41:49 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 4 fail over attempts. Trying to fail over after sleeping for 1934ms.
13/10/25 19:41:55 INFO mapred.JobClient:  map 0% reduce 0%
13/10/25 19:44:50 INFO mapred.JobClient:  map 1% reduce 0%
13/10/25 19:49:18 INFO mapred.JobClient:  map 2% reduce 0%
13/10/25 19:53:51 INFO mapred.JobClient:  map 3% reduce 0%
13/10/25 19:58:23 INFO mapred.JobClient:  map 4% reduce 0%
13/10/25 20:02:53 INFO mapred.JobClient:  map 5% reduce 0%
13/10/25 20:07:23 INFO mapred.JobClient:  map 6% reduce 0%
13/10/25 20:11:54 INFO mapred.JobClient:  map 7% reduce 0%
13/10/25 20:16:25 INFO mapred.JobClient:  map 8% reduce 0%
13/10/25 20:20:54 INFO mapred.JobClient:  map 9% reduce 0%
13/10/25 20:25:16 INFO mapred.JobClient:  map 10% reduce 0%
13/10/25 20:29:54 INFO mapred.JobClient:  map 11% reduce 0%
....

....

...

13/10/26 03:28:34 INFO mapred.JobClient:  map 54% reduce 0%
13/10/26 03:32:51 INFO mapred.JobClient:  map 55% reduce 0%
13/10/26 03:34:58 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10. Trying to fail over immediately.
13/10/26 03:34:58 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 1 fail over attempts. Trying to fail over after sleeping for 862ms.
13/10/26 03:34:59 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 2 fail over attempts. Trying to fail over after sleeping for 1076ms.
13/10/26 03:35:00 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 3 fail over attempts. Trying to fail over after sleeping for 1243ms.
13/10/26 03:35:02 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 4 fail over attempts. Trying to fail over after sleeping for 983ms.
13/10/26 03:35:06 INFO mapred.JobClient:  map 0% reduce 0%
13/10/26 03:38:08 INFO mapred.JobClient:  map 1% reduce 0%
13/10/26 03:42:36 INFO mapred.JobClient:  map 2% reduce 0%
13/10/26 03:47:05 INFO mapred.JobClient:  map 3% reduce 0%
13/10/26 03:51:34 INFO mapred.JobClient:  map 4% reduce 0%
13/10/26 03:56:01 INFO mapred.JobClient:  map 5% reduce 0%
13/10/26 04:00:36 INFO mapred.JobClient:  map 6% reduce 0%
13/10/26 04:05:01 INFO mapred.JobClient:  map 7% reduce 0%
13/10/26 04:09:31 INFO mapred.JobClient:  map 8% reduce 0%
13/10/26 04:13:55 INFO mapred.JobClient:  map 9% reduce 0%
13/10/26 04:18:27 INFO mapred.JobClient:  map 10% reduce 0%
13/10/26 04:23:09 INFO mapred.JobClient:  map 11% reduce 0%
13/10/26 04:27:42 INFO mapred.JobClient:  map 12% reduce 0%
13/10/26 04:31:59 INFO mapred.JobClient:  map 13% reduce 0%
13/10/26 04:36:21 INFO mapred.JobClient:  map 14% reduce 0%
13/10/26 04:40:49 INFO mapred.JobClient:  map 15% reduce 0%
13/10/26 04:45:20 INFO mapred.JobClient:  map 16% reduce 0%
13/10/26 04:49:38 INFO mapred.JobClient:  map 17% reduce 0%
13/10/26 04:53:59 INFO mapred.JobClient:  map 18% reduce 0%
13/10/26 04:58:21 INFO mapred.JobClient:  map 19% reduce 0%
13/10/26 05:02:42 INFO mapred.JobClient:  map 20% reduce 0%
13/10/26 05:06:57 INFO mapred.JobClient:  map 21% reduce 0%
13/10/26 05:11:20 INFO mapred.JobClient:  map 22% reduce 0%
13/10/26 05:15:36 INFO mapred.JobClient:  map 23% reduce 0%
13/10/26 05:19:55 INFO mapred.JobClient:  map 24% reduce 0%
13/10/26 05:24:26 INFO mapred.JobClient:  map 25% reduce 0%
13/10/26 05:28:48 INFO mapred.JobClient:  map 26% reduce 0%
13/10/26 05:33:16 INFO mapred.JobClient:  map 27% reduce 0%
13/10/26 05:37:43 INFO mapred.JobClient:  map 28% reduce 0%
13/10/26 05:42:09 INFO mapred.JobClient:  map 29% reduce 0%
13/10/26 05:46:36 INFO mapred.JobClient:  map 30% reduce 0%
13/10/26 05:48:53 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10. Trying to fail over immediately.
13/10/26 05:48:53 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 1 fail over attempts. Trying to fail over after sleeping for 1184ms.
13/10/26 05:48:55 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 2 fail over attempts. Trying to fail over after sleeping for 1239ms.
13/10/26 05:48:56 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 3 fail over attempts. Trying to fail over after sleeping for 2134ms.
13/10/26 05:48:58 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 4 fail over attempts. Trying to fail over after sleeping for 1728ms.
13/10/26 05:49:01 INFO mapred.JobClient:  map 0% reduce 0%
13/10/26 05:52:05 INFO mapred.JobClient:  map 1% reduce 0%
13/10/26 05:56:44 INFO mapred.JobClient:  map 2% reduce 0%
13/10/26 06:01:27 INFO mapred.JobClient:  map 3% reduce 0%
13/10/26 06:06:02 INFO mapred.JobClient:  map 4% reduce 0%
13/10/26 06:10:34 INFO mapred.JobClient:  map 5% reduce 0%
13/10/26 06:15:13 INFO mapred.JobClient:  map 6% reduce 0%
13/10/26 06:19:49 INFO mapred.JobClient:  map 7% reduce 0%
13/10/26 06:21:46 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10. Trying to fail over immediately.
13/10/26 06:21:46 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 1 fail over attempts. Trying to fail over after sleeping for 571ms.
13/10/26 06:21:47 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 2 fail over attempts. Trying to fail over after sleeping for 1910ms.
13/10/26 06:21:49 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 3 fail over attempts. Trying to fail over after sleeping for 1982ms.
13/10/26 06:21:51 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 4 fail over attempts. Trying to fail over after sleeping for 1868ms.
13/10/26 06:21:56 INFO mapred.JobClient:  map 0% reduce 0%
13/10/26 06:24:52 INFO mapred.JobClient:  map 1% reduce 0%
13/10/26 06:29:42 INFO mapred.JobClient:  map 2% reduce 0%
13/10/26 06:34:32 INFO mapred.JobClient:  map 3% reduce 0%
13/10/26 06:39:12 INFO mapred.JobClient:  map 4% reduce 0%
13/10/26 06:43:48 INFO mapred.JobClient:  map 5% reduce 0%

 

The job was running while one JT (active) failed (? or was forced off by the other standby JT) and the second standby JT became active.

It's all fine, they just switched roles, but the job restarted. I started from 0% of mappers. And so every time JT switch roles.

 

 

Maybe I did not properly configured failover controller?

 

This is very strange message, I can't find anything about it. (WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10. Trying to fail over immediately.)

 

 

In JT logs here are only WARN:

WARNmapreduce.Counters
Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead

 

 Thanks

Andrey

 

avatar
Super Collaborator

Hey Andrey,

 

Based on the description from the Hue side, you are definitely running into the Hue bug.  Jobs would work until the JT fails over and then stop.  

 

As for the Mapreduce issue.  I'm not actually sure, I'll take a look around.  It may be worth while to post that in the MR community as well...

 

Thanks

Chris

avatar
Super Collaborator

Hey Andrey,

 

Somehow I thought this forum was strictly Oozie:-).  Sorry for the confusion, this is the right place for MR, maybe one of the MR folks can chime in on the MR side of this question...  I will still take a look around and see if I can come up with something.

 

Thanks

Chris

avatar
Rising Star

Hey Chris,

 

That's ok:)

 

Thanks

Andrey

avatar
New Contributor

We ran into a similar issue with ha-jobtracker/oozie/hue: Basically we can submit jobs to oozie from the commandline with jobTracker=logicaljt which works fine but in case they fail and we wanna rerun them through hue, hue will replace the jobTracker variable with the first jobtracker it can find. Unfortunately for us that's currently the standby-one.

 

We managed to work around it by adding the correct jobtracker to the hue.ini safety valve:

 

[hadoop]
[[mapred_clusters]]
[[[default]]]
jobtracker_host=correct-jobtracker.example.com

 

Of course we're primarily looking forward to getting Hue 3.5 where this issue is fixed...

avatar
Super Collaborator

This is also fixed in CDH 4.6 and there is a Patched version of CDH 4.5 that resolves this as well.  You can open a support ticket to get the patched version of 4.5 if you are running CDH 4.5.  You would have to add a new parameters "logical_name", for example:

 

[hadoop]
[[mapred_clusters]]
[[[default]]]
jobtracker_host=cdh45-2.qa.test.com
thrift_port=9290
jobtracker_port=8021
submit_to=true
hadoop_mapred_home={{HADOOP_MR1_HOME}}
hadoop_bin={{HADOOP_BIN}}
hadoop_conf_dir={{HADOOP_CONF_DIR}}
security_enabled=true
logical_name=logicaljt
[[[jtha]]]
jobtracker_host=cdh45-1.qa.test.com
thrift_port=9290
jobtracker_port=8021
submit_to=true
hadoop_mapred_home={{HADOOP_MR1_HOME}}
hadoop_bin={{HADOOP_BIN}}
hadoop_conf_dir={{HADOOP_CONF_DIR}}
security_enabled=true
logical_name=logicaljt

 

Hope this helps.