Created on 10-24-2013 01:03 PM - edited 09-16-2022 01:49 AM
Hello, I have a hadoop cluster with CDH 4.4.0 and HDFS HA and JobTracker HA.
I tried to lunch oozie workflow but got this:
JA006: Call From xxxx to xxxx failed on connection exception: java.net.ConnectException: Connection refused; For more details see:http://wiki.apache.org/hadoop/ConnectionRefused
I found this: https://issues.cloudera.org/browse/HUE-1631
But I don't know how to fix my problem. Where to configure oozie to work with JobTracker HA?
Created 10-24-2013 01:21 PM
Hey Markovich,
Are you running the workflow from inside Hue or from the Oozie command line? If from within Hue, then you are running into HUE-1631 and it won't be fixed until a later release of CDH. If you are not using Hue, can you attach your job.properties file and workflow?
Thanks
Chris
Created 10-24-2013 01:21 PM
Hey Markovich,
Are you running the workflow from inside Hue or from the Oozie command line? If from within Hue, then you are running into HUE-1631 and it won't be fixed until a later release of CDH. If you are not using Hue, can you attach your job.properties file and workflow?
Thanks
Chris
Created 10-25-2013 04:18 AM
Hi, Chris!
I am running the workflow from inside HUE.
Thanks, for reply.
May be you can also help me with this question (http://community.cloudera.com/t5/Batch-Processing-and-Workflow/How-to-limit-jobcache-foledr-size/m-p... ?
Thanks,
Andrey
Created 10-25-2013 06:33 AM
Hey Markovich,
I definitely think you are running into that bug then. It's worth noting that it should only fail on existing workflows that point to the wrong JT. If you create a new workflow it should work until the next JT failover. Do new workflows work for you?
I will take a look at your other question as well.
Thanks
Chris
Created 10-26-2013 12:24 AM
Hey Chris,
Yes, new workflows work until JT failover. I am runnning my jobs from command line now.
I have another strange thing.
Here is output of my MapReduce Job:
13/10/25 18:00:46 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/10/25 18:00:56 INFO input.FileInputFormat: Total input paths to process : 12451
13/10/25 18:01:06 INFO mapred.JobClient: Running job: job_201310251009_0004
13/10/25 18:01:07 INFO mapred.JobClient: map 0% reduce 0%
13/10/25 18:03:48 INFO mapred.JobClient: map 1% reduce 0%
13/10/25 18:08:25 INFO mapred.JobClient: map 2% reduce 0%
13/10/25 18:12:58 INFO mapred.JobClient: map 3% reduce 0%
13/10/25 18:17:21 INFO mapred.JobClient: map 4% reduce 0%
13/10/25 18:21:53 INFO mapred.JobClient: map 5% reduce 0%
13/10/25 18:26:26 INFO mapred.JobClient: map 6% reduce 0%
13/10/25 18:30:59 INFO mapred.JobClient: map 7% reduce 0%
13/10/25 18:35:26 INFO mapred.JobClient: map 8% reduce 0%
13/10/25 18:39:51 INFO mapred.JobClient: map 9% reduce 0%
13/10/25 18:44:29 INFO mapred.JobClient: map 10% reduce 0%
13/10/25 18:48:53 INFO mapred.JobClient: map 11% reduce 0%
13/10/25 18:53:30 INFO mapred.JobClient: map 12% reduce 0%
13/10/25 18:57:52 INFO mapred.JobClient: map 13% reduce 0%
13/10/25 19:02:11 INFO mapred.JobClient: map 14% reduce 0%
13/10/25 19:06:32 INFO mapred.JobClient: map 15% reduce 0%
13/10/25 19:10:59 INFO mapred.JobClient: map 16% reduce 0%
13/10/25 19:15:19 INFO mapred.JobClient: map 17% reduce 0%
13/10/25 19:19:38 INFO mapred.JobClient: map 18% reduce 0%
13/10/25 19:23:55 INFO mapred.JobClient: map 19% reduce 0%
13/10/25 19:28:27 INFO mapred.JobClient: map 20% reduce 0%
13/10/25 19:32:46 INFO mapred.JobClient: map 21% reduce 0%
13/10/25 19:37:16 INFO mapred.JobClient: map 22% reduce 0%
13/10/25 19:41:44 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10. Trying to fail over immediately.
13/10/25 19:41:44 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 1 fail over attempts. Trying to fail over after sleeping for 1200ms.
13/10/25 19:41:45 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 2 fail over attempts. Trying to fail over after sleeping for 1403ms.
13/10/25 19:41:47 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 3 fail over attempts. Trying to fail over after sleeping for 2140ms.
13/10/25 19:41:49 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 4 fail over attempts. Trying to fail over after sleeping for 1934ms.
13/10/25 19:41:55 INFO mapred.JobClient: map 0% reduce 0%
13/10/25 19:44:50 INFO mapred.JobClient: map 1% reduce 0%
13/10/25 19:49:18 INFO mapred.JobClient: map 2% reduce 0%
13/10/25 19:53:51 INFO mapred.JobClient: map 3% reduce 0%
13/10/25 19:58:23 INFO mapred.JobClient: map 4% reduce 0%
13/10/25 20:02:53 INFO mapred.JobClient: map 5% reduce 0%
13/10/25 20:07:23 INFO mapred.JobClient: map 6% reduce 0%
13/10/25 20:11:54 INFO mapred.JobClient: map 7% reduce 0%
13/10/25 20:16:25 INFO mapred.JobClient: map 8% reduce 0%
13/10/25 20:20:54 INFO mapred.JobClient: map 9% reduce 0%
13/10/25 20:25:16 INFO mapred.JobClient: map 10% reduce 0%
13/10/25 20:29:54 INFO mapred.JobClient: map 11% reduce 0%
....
....
...
13/10/26 03:28:34 INFO mapred.JobClient: map 54% reduce 0%
13/10/26 03:32:51 INFO mapred.JobClient: map 55% reduce 0%
13/10/26 03:34:58 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10. Trying to fail over immediately.
13/10/26 03:34:58 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 1 fail over attempts. Trying to fail over after sleeping for 862ms.
13/10/26 03:34:59 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 2 fail over attempts. Trying to fail over after sleeping for 1076ms.
13/10/26 03:35:00 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 3 fail over attempts. Trying to fail over after sleeping for 1243ms.
13/10/26 03:35:02 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 4 fail over attempts. Trying to fail over after sleeping for 983ms.
13/10/26 03:35:06 INFO mapred.JobClient: map 0% reduce 0%
13/10/26 03:38:08 INFO mapred.JobClient: map 1% reduce 0%
13/10/26 03:42:36 INFO mapred.JobClient: map 2% reduce 0%
13/10/26 03:47:05 INFO mapred.JobClient: map 3% reduce 0%
13/10/26 03:51:34 INFO mapred.JobClient: map 4% reduce 0%
13/10/26 03:56:01 INFO mapred.JobClient: map 5% reduce 0%
13/10/26 04:00:36 INFO mapred.JobClient: map 6% reduce 0%
13/10/26 04:05:01 INFO mapred.JobClient: map 7% reduce 0%
13/10/26 04:09:31 INFO mapred.JobClient: map 8% reduce 0%
13/10/26 04:13:55 INFO mapred.JobClient: map 9% reduce 0%
13/10/26 04:18:27 INFO mapred.JobClient: map 10% reduce 0%
13/10/26 04:23:09 INFO mapred.JobClient: map 11% reduce 0%
13/10/26 04:27:42 INFO mapred.JobClient: map 12% reduce 0%
13/10/26 04:31:59 INFO mapred.JobClient: map 13% reduce 0%
13/10/26 04:36:21 INFO mapred.JobClient: map 14% reduce 0%
13/10/26 04:40:49 INFO mapred.JobClient: map 15% reduce 0%
13/10/26 04:45:20 INFO mapred.JobClient: map 16% reduce 0%
13/10/26 04:49:38 INFO mapred.JobClient: map 17% reduce 0%
13/10/26 04:53:59 INFO mapred.JobClient: map 18% reduce 0%
13/10/26 04:58:21 INFO mapred.JobClient: map 19% reduce 0%
13/10/26 05:02:42 INFO mapred.JobClient: map 20% reduce 0%
13/10/26 05:06:57 INFO mapred.JobClient: map 21% reduce 0%
13/10/26 05:11:20 INFO mapred.JobClient: map 22% reduce 0%
13/10/26 05:15:36 INFO mapred.JobClient: map 23% reduce 0%
13/10/26 05:19:55 INFO mapred.JobClient: map 24% reduce 0%
13/10/26 05:24:26 INFO mapred.JobClient: map 25% reduce 0%
13/10/26 05:28:48 INFO mapred.JobClient: map 26% reduce 0%
13/10/26 05:33:16 INFO mapred.JobClient: map 27% reduce 0%
13/10/26 05:37:43 INFO mapred.JobClient: map 28% reduce 0%
13/10/26 05:42:09 INFO mapred.JobClient: map 29% reduce 0%
13/10/26 05:46:36 INFO mapred.JobClient: map 30% reduce 0%
13/10/26 05:48:53 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10. Trying to fail over immediately.
13/10/26 05:48:53 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 1 fail over attempts. Trying to fail over after sleeping for 1184ms.
13/10/26 05:48:55 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 2 fail over attempts. Trying to fail over after sleeping for 1239ms.
13/10/26 05:48:56 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 3 fail over attempts. Trying to fail over after sleeping for 2134ms.
13/10/26 05:48:58 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 4 fail over attempts. Trying to fail over after sleeping for 1728ms.
13/10/26 05:49:01 INFO mapred.JobClient: map 0% reduce 0%
13/10/26 05:52:05 INFO mapred.JobClient: map 1% reduce 0%
13/10/26 05:56:44 INFO mapred.JobClient: map 2% reduce 0%
13/10/26 06:01:27 INFO mapred.JobClient: map 3% reduce 0%
13/10/26 06:06:02 INFO mapred.JobClient: map 4% reduce 0%
13/10/26 06:10:34 INFO mapred.JobClient: map 5% reduce 0%
13/10/26 06:15:13 INFO mapred.JobClient: map 6% reduce 0%
13/10/26 06:19:49 INFO mapred.JobClient: map 7% reduce 0%
13/10/26 06:21:46 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10. Trying to fail over immediately.
13/10/26 06:21:46 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 1 fail over attempts. Trying to fail over after sleeping for 571ms.
13/10/26 06:21:47 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 2 fail over attempts. Trying to fail over after sleeping for 1910ms.
13/10/26 06:21:49 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 3 fail over attempts. Trying to fail over after sleeping for 1982ms.
13/10/26 06:21:51 WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10 after 4 fail over attempts. Trying to fail over after sleeping for 1868ms.
13/10/26 06:21:56 INFO mapred.JobClient: map 0% reduce 0%
13/10/26 06:24:52 INFO mapred.JobClient: map 1% reduce 0%
13/10/26 06:29:42 INFO mapred.JobClient: map 2% reduce 0%
13/10/26 06:34:32 INFO mapred.JobClient: map 3% reduce 0%
13/10/26 06:39:12 INFO mapred.JobClient: map 4% reduce 0%
13/10/26 06:43:48 INFO mapred.JobClient: map 5% reduce 0%
The job was running while one JT (active) failed (? or was forced off by the other standby JT) and the second standby JT became active.
It's all fine, they just switched roles, but the job restarted. I started from 0% of mappers. And so every time JT switch roles.
Maybe I did not properly configured failover controller?
This is very strange message, I can't find anything about it. (WARN retry.RetryInvocationHandler: Exception while invoking getTaskCompletionEvents of class $Proxy10. Trying to fail over immediately.)
In JT logs here are only WARN:
WARN | mapreduce.Counters | Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead |
Thanks
Andrey
Created 10-27-2013 08:29 AM
Hey Andrey,
Based on the description from the Hue side, you are definitely running into the Hue bug. Jobs would work until the JT fails over and then stop.
As for the Mapreduce issue. I'm not actually sure, I'll take a look around. It may be worth while to post that in the MR community as well...
Thanks
Chris
Created 10-27-2013 08:38 AM
Hey Andrey,
Somehow I thought this forum was strictly Oozie:-). Sorry for the confusion, this is the right place for MR, maybe one of the MR folks can chime in on the MR side of this question... I will still take a look around and see if I can come up with something.
Thanks
Chris
Created 10-27-2013 11:41 PM
Hey Chris,
That's ok:)
Thanks
Andrey
Created 04-04-2014 12:07 PM
We ran into a similar issue with ha-jobtracker/oozie/hue: Basically we can submit jobs to oozie from the commandline with jobTracker=logicaljt which works fine but in case they fail and we wanna rerun them through hue, hue will replace the jobTracker variable with the first jobtracker it can find. Unfortunately for us that's currently the standby-one.
We managed to work around it by adding the correct jobtracker to the hue.ini safety valve:
[hadoop]
[[mapred_clusters]]
[[[default]]]
jobtracker_host=correct-jobtracker.example.com
Of course we're primarily looking forward to getting Hue 3.5 where this issue is fixed...
Created 04-04-2014 01:14 PM
This is also fixed in CDH 4.6 and there is a Patched version of CDH 4.5 that resolves this as well. You can open a support ticket to get the patched version of 4.5 if you are running CDH 4.5. You would have to add a new parameters "logical_name", for example:
[hadoop]
[[mapred_clusters]]
[[[default]]]
jobtracker_host=cdh45-2.qa.test.com
thrift_port=9290
jobtracker_port=8021
submit_to=true
hadoop_mapred_home={{HADOOP_MR1_HOME}}
hadoop_bin={{HADOOP_BIN}}
hadoop_conf_dir={{HADOOP_CONF_DIR}}
security_enabled=true
logical_name=logicaljt
[[[jtha]]]
jobtracker_host=cdh45-1.qa.test.com
thrift_port=9290
jobtracker_port=8021
submit_to=true
hadoop_mapred_home={{HADOOP_MR1_HOME}}
hadoop_bin={{HADOOP_BIN}}
hadoop_conf_dir={{HADOOP_CONF_DIR}}
security_enabled=true
logical_name=logicaljt
Hope this helps.