Member since
12-31-2018
5
Posts
0
Kudos Received
0
Solutions
01-07-2019
12:01 PM
Thanks @Guillaume Roger I was facing the above issue because the ats-hbase.json file is present under ats-hbase user in hdfs and running the command from any other user doesn't help. I was able to fix the issue by trying couple of things 1. Running below command 2. Restarting Timeline Service V2.0 through ambari. I am not sure whether it actually started the hbase-ats because I cannot see any service running in resource manager which was the case earlier. curl -k -u: -H "Content-Type: application/json" -X PUT http://<ResourceManagerHost>:<ResourceManagerPort>/app/v1/services/ats-hbase?user.name=yarn-ats -d '{"state": "STOPPED"
}' curl -k -u: -H "Content-Type: application/json" -X PUT http://<ResourceManagerHost>:<ResourceManagerPort>/app/v1/services/ats-hbase?user.name=yarn-ats -d '{"state": "STARTED"
}' I am not sure restarting helped because I had restarted the cluster few times and that might have restarted the timeline server. I used below document for reference. https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/data-operating-system/content/options_to_restart_ats-hbase.html
... View more
01-06-2019
04:33 PM
@Guillaume Roger / @Aditya Sirna How do we start the ats-hbase? I tried sudo yarn app -start ats-hbase but it gives error: ERROR client.Api Service Client: File does not exist: .yarn/services/ats-hbase/ats-hbase.json The service was running on my cluster I stopped it from resource manager Services tab as I had reconfigure container size and this service was using old container size. But I know cannot see any service in the tab to start. Please help as absence of this service causing the jobs to hang(for approximately 5 mins) after 100% completion of mappers and reducers.
... View more
01-02-2019
09:29 AM
@Jagadeesan A S Request you to confirm if it's the same issue as revert YARN-6078 ? Is there a way to avoid or minimize the occurrence of the issue by some configuration changes?
... View more
12-31-2018
11:33 PM
I am using 3.0.0.0-1634.
The annoying part is that the issue is random.The same job runs fine and suddenly it fails with this issue and next time some other job might fail after couple of days which had successfully ran earlier and which will run properly in future. There is no other jobs running during that time on the cluster.
18/12/29 20:25:21 INFO mapreduce.Job: Running job: job_1546013184089_0046
18/12/29 20:49:47 INFO mapreduce.Job: Job job_1546013184089_0046 running in uber mode : false 18/12/29 20:49:47 INFO mapreduce.Job: map 0% reduce 0%
18/12/29 20:49:47 INFO mapreduce.Job: Job job_1546013184089_0046 failed with state FAILED due to: Application application_1546013184089_0046 failed 2 times due to ApplicationMaster for attempt appattempt_1546013184089_0046_000002 timed out. Failing the application.
18/12/29 20:49:47 INFO mapreduce.Job: Counters: 0
18/12/29 20:49:47 ERROR crawl.DeduplicationJob: DeduplicationJob: java.io.IOException: Job failed!
In Resource manager in Diagnostics I found below error:
Application application_1546114179060_0069 failed 2 times due to ApplicationMaster for attempt appattempt_1546114179060_0069_000002 timed out. Failing the application.
I also found below error somewhere in the log:
java.lang.Exception: Container is not yet running. Current state is LOCALIZING
It's a 4 node cluster. Available vcore is 100 and memory is 416 GB when there is nothing running on the cluster.
Jobs are submitted through default queues.
Minimum container size is 4GB.
Capacity Scheduler:
capacity-scheduler=null
yarn.scheduler.capacity.maximum-am-resource-percent=0.5
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator
yarn.scheduler.capacity.root.accessible-node-labels=*
yarn.scheduler.capacity.root.acl_administer_queue=*
yarn.scheduler.capacity.root.acl_submit_applications=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default.acl_administer_jobs=*
yarn.scheduler.capacity.root.default.acl_submit_applications=*
yarn.scheduler.capacity.root.default.capacity=100
yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=1
yarn.scheduler.capacity.root.queues=default
yarn.scheduler.capacity.schedule-asynchronously.enable=true
yarn.scheduler.capacity.schedule-asynchronously.maximum-threads=1
yarn.scheduler.capacity.schedule-asynchronously.scheduling-interval-ms=10
The tried restarting the whole cluster once. I also tried just restarting yarn once.
The funny part is the job runs fine without doing all this if I just rerun the job without changing anything.
I never faced this issue it suddenly started popping up since last few days.
I just changed minimum container size from 13 GB to 4 GB. But everything worked fine for 2 weeks after that. So I don't think that might be the issue. Apart from that I haven't changed anything on the cluster.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN
12-31-2018
03:12 PM
@Jagadeesan A S I am using 3.0.0.0-1634. But I am still facing this issue. It is resolved in 3.0.0 or 3.1.0? The annoying part is that the issue is random. It picks any job at any time. The same job runs fine and suddenly it fails with this issue and next time some other job might fail after couple of days which had successfully ran earlier and which will run properly in future. There is no other jobs running during that time.
... View more