About ttruong

ttruong · ‎10-08-2015

We were able to get it up. Thank you very much.

ttruong · ‎10-07-2015

We have 3 node Zookeeper quorum, and on of the node was accidently terminated on AWS. Our cluster is on CDH5.3.3, and it has these services: HDFS, Yarn, HBase, Oozie, Zookeeper. We like to add the node back to the quorum. Beside the node is part of the zookeeper quorum, it also had these roles as well: Hbase Master (fail over successfully), Yarn Resource Manger (HA - fail over successfully), Journalnode (HA), and Oozie. Is anyone know how to do it? If you can please provide the steps. Thanks,

ttruong · ‎07-06-2015

Thank you very much for your response. I checked both resource manager node and the value for yarn.resourcemanager.webapp.address property was set. I still could not get it working.

ttruong · ‎06-02-2015

Hi Linou, Were you able to resolve your issues again? I still could not solve it yet. Thanks

ttruong · ‎06-01-2015

Hi Wilfred, We are currently on 5.3.3 now, and we are still having that issues now. Also, I did try yarn command with -appOwner option but it still returned the same message. I ran the command as yarn user. Thanks

ttruong · ‎05-28-2015

Hi Wilfred, I really don't want to disable the ACL. I just tried to disable it to see it help to resolve the issue. I prefer to get the application log using the UI instead of login to the host to retrieve the logs because not everyone has access to the host. By the way, I just tried to retrive the log based on your suggestion by issue the command below, but it still did not work. yarn logs -applicationId application_1432831904896_0149 I got this message: 15/05/28 14:51:17 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm54 Application has not completed. Logs are only available after an application completes

ttruong · ‎05-21-2015

Hi Wilfred, Have you had a chance to take a look with my update? Thanks

ttruong · ‎05-18-2015

There were not much error from the resource manager log. Whenever I click on the child url, there there message like this in the RM log. I ran the job with non oozie user, but it still happened if I ran it with oozie user. The link try to access: http://i-802bd856.prod-dis11.aws1:8088/proxy/application_1431873888015_3748 Here is the partial log. 2015-05-18 08:41:37,237 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://i-922ad944.prod-dis11.aws1:33802/ws/v1/mapreduce/jobs/job_1431873888015_3748 which is the app master GUI of application_1431873888015_3748 owned by inventory 2015-05-18 08:41:37,237 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://i-9f2ad949.prod-dis11.aws1:47581/ws/v1/mapreduce/jobs/job_1431873888015_3758 which is the app master GUI of application_1431873888015_3758 owned by oozie 2015-05-18 08:41:43,844 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1431873888015_3758_000001 with final state: FINISHING, and exit status: -1000 2015-05-18 08:41:43,846 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://i-922ad944.prod-dis11.aws1:33802/ which is the app master GUI of application_1431873888015_3748 owned by inventory 2015-05-18 08:41:43,846 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1431873888015_3758_000001 State change from RUNNING to FINAL_SAVING 2015-05-18 08:41:43,846 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1431873888015_3758 with final state: FINISHING 2015-05-18 08:41:43,848 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: NodeDataChanged with state:SyncConnected for path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1431873888015_3758/appattempt_1431873888015_3758_000001 for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED I found some other people who had kind of similar message from log about (dr. who) and they were able to resolve by playing with hadoop.http.staticuser.user property or disable ACL. I tried to disabled by setting yarn.acl.enable = false, but I did not help.

ttruong · ‎05-13-2015

I was able to fix the issue. We use chef to setup hbase configuration on worker node, but there were problem with chef setting which caused the missing hbase connection setting on the worker node. After I fix chef, the hbase connection was setup fine. Thanks

ttruong · ‎05-13-2015

Hi Wilfred, Yes, it is logged in the RM, and I do have HA setup for RM. I do have HA setup for HDFS as well. The job did show on the RM. Below is a screen shot of RM UI page which listed all running job. When I clicked on the ApplicationMaster link for each job under Tracking UI column, the errors page was showned instead of the actual status page. I also got the same errors if I click on search icon for Child Job 1: under the Child JOb Urls on Oozie UI. Here is a screen shot of the oozie page. Here is the screen shoot for the error page: I did could get to the status page find after the job completed, but not while it is still running. Thank you very for your help.

Online	Offline
Last Visited	‎03-27-2017 01:24 PM

Member Since	‎02-24-2015 11:38 AM
Last Visited	‎03-27-2017 01:24 PM
Posts	27
Kudos received	5

Cloudera Community

Re: org.apache.hadoop.hbase.zookeeper.RecoverableZ...

Re: The active NameNode is out of sync with this J...

Re: 1 of the 3 node Zookeeper quorum failed, how t...

1 of the 3 node Zookeeper quorum failed, how to ad...

Re: Got 500 Error when trying to view status of a ...

Re: Got 500 Error when trying to view status of a ...

Re: Got 500 Error when trying to view status of a ...

Re: Got 500 Error when trying to view status of a ...

Re: Got 500 Error when trying to view status of a ...

Re: Got 500 Error when trying to view status of a ...

Re: org.apache.hadoop.hbase.zookeeper.RecoverableZ...

Re: Got 500 Error when trying to view status of a ...