About surajsss06

surajsss06 · ‎01-07-2019

Thanks @Guillaume Roger I was facing the above issue because the ats-hbase.json file is present under ats-hbase user in hdfs and running the command from any other user doesn't help. I was able to fix the issue by trying couple of things 1. Running below command 2. Restarting Timeline Service V2.0 through ambari. I am not sure whether it actually started the hbase-ats because I cannot see any service running in resource manager which was the case earlier. curl -k -u: -H "Content-Type: application/json" -X PUT http://<ResourceManagerHost>:<ResourceManagerPort>/app/v1/services/ats-hbase?user.name=yarn-ats -d '{"state": "STOPPED" }' curl -k -u: -H "Content-Type: application/json" -X PUT http://<ResourceManagerHost>:<ResourceManagerPort>/app/v1/services/ats-hbase?user.name=yarn-ats -d '{"state": "STARTED" }' I am not sure restarting helped because I had restarted the cluster few times and that might have restarted the timeline server. I used below document for reference. https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/data-operating-system/content/options_to_restart_ats-hbase.html

surajsss06 · ‎01-06-2019

@Guillaume Roger / @Aditya Sirna How do we start the ats-hbase? I tried sudo yarn app -start ats-hbase but it gives error: ERROR client.Api Service Client: File does not exist: .yarn/services/ats-hbase/ats-hbase.json The service was running on my cluster I stopped it from resource manager Services tab as I had reconfigure container size and this service was using old container size. But I know cannot see any service in the tab to start. Please help as absence of this service causing the jobs to hang(for approximately 5 mins) after 100% completion of mappers and reducers.

surajsss06 · ‎01-02-2019

@Jagadeesan A S Request you to confirm if it's the same issue as revert YARN-6078 ? Is there a way to avoid or minimize the occurrence of the issue by some configuration changes?

surajsss06 · ‎12-31-2018

I am using 3.0.0.0-1634. The annoying part is that the issue is random.The same job runs fine and suddenly it fails with this issue and next time some other job might fail after couple of days which had successfully ran earlier and which will run properly in future. There is no other jobs running during that time on the cluster. 18/12/29 20:25:21 INFO mapreduce.Job: Running job: job_1546013184089_0046 18/12/29 20:49:47 INFO mapreduce.Job: Job job_1546013184089_0046 running in uber mode : false 18/12/29 20:49:47 INFO mapreduce.Job: map 0% reduce 0% 18/12/29 20:49:47 INFO mapreduce.Job: Job job_1546013184089_0046 failed with state FAILED due to: Application application_1546013184089_0046 failed 2 times due to ApplicationMaster for attempt appattempt_1546013184089_0046_000002 timed out. Failing the application. 18/12/29 20:49:47 INFO mapreduce.Job: Counters: 0 18/12/29 20:49:47 ERROR crawl.DeduplicationJob: DeduplicationJob: java.io.IOException: Job failed! In Resource manager in Diagnostics I found below error: Application application_1546114179060_0069 failed 2 times due to ApplicationMaster for attempt appattempt_1546114179060_0069_000002 timed out. Failing the application. I also found below error somewhere in the log: java.lang.Exception: Container is not yet running. Current state is LOCALIZING It's a 4 node cluster. Available vcore is 100 and memory is 416 GB when there is nothing running on the cluster. Jobs are submitted through default queues. Minimum container size is 4GB. Capacity Scheduler: capacity-scheduler=null yarn.scheduler.capacity.maximum-am-resource-percent=0.5 yarn.scheduler.capacity.maximum-applications=10000 yarn.scheduler.capacity.node-locality-delay=40 yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator yarn.scheduler.capacity.root.accessible-node-labels=* yarn.scheduler.capacity.root.acl_administer_queue=* yarn.scheduler.capacity.root.acl_submit_applications=* yarn.scheduler.capacity.root.capacity=100 yarn.scheduler.capacity.root.default.acl_administer_jobs=* yarn.scheduler.capacity.root.default.acl_submit_applications=* yarn.scheduler.capacity.root.default.capacity=100 yarn.scheduler.capacity.root.default.maximum-capacity=100 yarn.scheduler.capacity.root.default.state=RUNNING yarn.scheduler.capacity.root.default.user-limit-factor=1 yarn.scheduler.capacity.root.queues=default yarn.scheduler.capacity.schedule-asynchronously.enable=true yarn.scheduler.capacity.schedule-asynchronously.maximum-threads=1 yarn.scheduler.capacity.schedule-asynchronously.scheduling-interval-ms=10 The tried restarting the whole cluster once. I also tried just restarting yarn once. The funny part is the job runs fine without doing all this if I just rerun the job without changing anything. I never faced this issue it suddenly started popping up since last few days. I just changed minimum container size from 13 GB to 4 GB. But everything worked fine for 2 weeks after that. So I don't think that might be the issue. Apart from that I haven't changed anything on the cluster.

surajsss06 · ‎12-31-2018

@Jagadeesan A S I am using 3.0.0.0-1634. But I am still facing this issue. It is resolved in 3.0.0 or 3.1.0? The annoying part is that the issue is random. It picks any job at any time. The same job runs fine and suddenly it fails with this issue and next time some other job might fail after couple of days which had successfully ran earlier and which will run properly in future. There is no other jobs running during that time.

Online	Offline
Last Visited	‎01-07-2019 12:01 PM

Member Since	‎12-31-2018 12:45 PM
Last Visited	‎01-07-2019 12:01 PM
Posts	5

Cloudera Community

Re: ATS2-hbase starts but on the wrong node

Re: ATS2-hbase starts but on the wrong node

Re: yarn job stuck in accepted state randomly with...

yarn job stuck in accepted state randomly with no ...

Re: Yarn jobs are getting stuck in ACCEPTED state