Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Yarn is not stable and getting SIGTERM 15 and connection refused errors in resource manager and job history server

Highlighted

Yarn is not stable and getting SIGTERM 15 and connection refused errors in resource manager and job history server

Explorer

Hello Team,

 

Can some one please help me to understand whats the wrong here, 

 

Stand by Resource Manager Logs,

==========================

 

2020-01-07 04:51:58,686 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: RECEIVED SIGNAL 15: SIGTERM
2020-01-07 04:51:58,691 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2020-01-07 04:51:58,696 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2020-01-07 04:51:58,697 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2020-01-07 04:52:03,727 WARN org.apache.hadoop.conf.Configuration: java.io.BufferedInputStream@7ffeac8e:an attempt to override final parameter: hadoop.ssl.require.client.cert; Ignoring.
2020-01-07 04:52:03,754 WARN org.apache.hadoop.conf.Configuration: java.io.BufferedInputStream@7ffeac8e:an attempt to override final parameter: hadoop.ssl.keystores.factory.class; Ignoring.
2020-01-07 04:52:03,754 WARN org.apache.hadoop.conf.Configuration: java.io.BufferedInputStream@7ffeac8e:an attempt to override final parameter: hadoop.ssl.server.conf; Ignoring.
2020-01-07 04:52:03,754 WARN org.apache.hadoop.conf.Configuration: java.io.BufferedInputStream@7ffeac8e:an attempt to override final parameter: hadoop.ssl.client.conf; Ignoring.

 

 

Active Resource Manager Logs:

========================

 

2020-01-07 05:00:09,047 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: RECEIVED SIGNAL 15: SIGTERM
2020-01-07 05:00:09,054 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2020-01-07 05:00:09,059 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2020-01-07 05:00:09,060 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2020-01-07 05:00:09,175 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x36f7ed02da705b0
2020-01-07 05:00:09,180 WARN org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher: org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
2020-01-07 05:00:09,196 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Returning, interrupted : java.lang.InterruptedException
2020-01-07 05:00:09,197 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Update thread interrupted. Exiting.
2020-01-07 05:00:09,199 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted

2020-01-07 05:00:13,564 WARN org.apache.hadoop.conf.Configuration: java.io.BufferedInputStream@18fd290:an attempt to override final parameter: hadoop.ssl.require.client.cert; Ignoring.
2020-01-07 05:00:13,590 WARN org.apache.hadoop.conf.Configuration: java.io.BufferedInputStream@18fd290:an attempt to override final parameter: hadoop.ssl.keystores.factory.class; Ignoring.
2020-01-07 05:00:13,590 WARN org.apache.hadoop.conf.Configuration: java.io.BufferedInputStream@18fd290:an attempt to override final parameter: hadoop.ssl.server.conf; Ignoring.
2020-01-07 05:00:13,590 WARN org.apache.hadoop.conf.Configuration: java.io.BufferedInputStream@18fd290:an attempt to override final parameter: hadoop.ssl.client.conf; Ignoring.

 

 

The jobs are running very slow and i can see the job logs in server but The applications are not showing in Yarn Applications tab in cloudera manager.

 

Appreciate your help.

 

Best Regards,

Vinod

 

 

11 REPLIES 11
Highlighted

Re: Yarn is not stable and getting SIGTERM 15 and connection refused errors in resource manager and job history server

Explorer

Hello @EricL 

 

Can you please help me,

There is no job running in YARN and getting below warnings and errors in NODEMANAGER,

 

 

2020-01-07 11:07:45,764 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM
2020-01-07 11:07:45,977 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting.
2020-01-07 11:07:59,163 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for class class org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config.

 

 

 

 

Resource Manager Logs,

===================

 

 

2020-01-07 10:18:59,364 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: RECEIVED SIGNAL 15: SIGTERM
2020-01-07 10:18:59,371 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2020-01-07 10:18:59,375 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2020-01-07 10:18:59,377 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2020-01-07 10:18:59,494 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x16f807d0eea02cd
2020-01-07 10:18:59,499 WARN org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher: org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
2020-01-07 10:18:59,517 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Returning, interrupted : java.lang.InterruptedException
2020-01-07 10:18:59,518 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Update thread interrupted. Exiting.
2020-01-07 10:18:59,519 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted

 

 

Job History Server :

===============

 

2020-01-07 11:08:14,980 WARN org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService: Could not read the contents of hdfs://nameservice1/tmp/logs/mcaf/logs Permission denied: user=mapred, access=READ_EXECUTE, inode="/tmp/logs/mcaf/logs":mcaf:supergroup:drwxrwx---
2020-01-07 11:08:14,980 INFO org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService: aggregated log deletion finished.
2020-01-07 11:08:15,038 INFO org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://nameservice1:8020]
2020-01-07 11:08:43,242 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: History Cleaner started
2020-01-07 11:08:43,348 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: History Cleaner complete
2020-01-07 11:11:13,239 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: Starting scan to move intermediate done files

 

 

Thanks in advance.....!!!

 

Best Regards,

Vinod

 

 

 

Highlighted

Re: Yarn is not stable and getting SIGTERM 15 and connection refused errors in resource manager and job history server

Explorer

Hello Team,

 

Any updates would be appreciated and still we are seeing same problem.

 

Thanks,

Vinod 

Highlighted

Re: Yarn is not stable and getting SIGTERM 15 and connection refused errors in resource manager and job history server

Guru
@kvinod,

Sorry, I don't have much clue here, I am checking with my team and see if anyone else can help.

In the mean time, can you let us know in RM web UI, how many NodeManagers are healthy?

Also, are you able to upload active RM logs for us to see a bigger picture? The log messages you provided is very limited.

Cheers
Eric

Re: Yarn is not stable and getting SIGTERM 15 and connection refused errors in resource manager and job history server

Guru
And please suggest if you are using HDP or CDH, as they might behave differently.
Highlighted

Re: Yarn is not stable and getting SIGTERM 15 and connection refused errors in resource manager and job history server

Explorer

Hi @EricL 

 

We are using Cloudera Manager, 

The another clue is When ever we are submitting the jobs it was launching in local mode not yarn mode.

We have a YARN gateway in Edgenode but the jobs are not launching in YARN.

 

When i run test run in edgenode,

yarn jar /opt/cloudera/parcels/CDH/jars/hadoop-examples.jar teragen 500000000 /tmp/test5_teragen7
20/01/09 02:24:18 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
20/01/09 02:24:18 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
20/01/09 02:24:19 INFO terasort.TeraSort: Generating 500000000 using 1
20/01/09 02:24:19 INFO mapreduce.JobSubmitter: number of splits:1
20/01/09 02:24:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1658357393_0001
20/01/09 02:24:20 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
20/01/09 02:24:20 INFO mapreduce.Job: Running job: job_local1658357393_0001
20/01/09 02:24:20 INFO mapred.LocalJobRunner: OutputCommitter set in config null
20/01/09 02:24:20 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
20/01/09 02:24:20 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
20/01/09 02:24:20 INFO mapred.LocalJobRunner: Waiting for map tasks
20/01/09 02:24:20 INFO mapred.LocalJobRunner: Starting task: attempt_local1658357393_0001_m_000000_0
20/01/09 02:24:20 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
20/01/09 02:24:20 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
20/01/09 02:24:20 INFO mapred.MapTask: Processing split: org.apache.hadoop.examples.terasort.TeraGen$RangeInputFormat$RangeInputSplit@4d182ef0
20/01/09 02:24:21 INFO mapreduce.Job: Job job_local1658357393_0001 running in uber mode : false
20/01/09 02:24:21 INFO mapreduce.Job: map 0% reduce 0%

 

 

I cant see this job in YARN Applications.

 

When i ran the same in other environment, i can see it is launching application and i can see it in YARN Applications.

 

yarn jar /opt/cloudera/parcels/CDH-5.4.10-1.cdh5.4.10.p0.16/jars/hadoop-examples.jar teragen 500000000 /tmp/test5_teragen33
20/01/09 02:27:39 INFO terasort.TeraSort: Generating 500000000 using 2
20/01/09 02:27:39 INFO mapreduce.JobSubmitter: number of splits:2
20/01/09 02:27:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1569582055993_48877
20/01/09 02:27:39 INFO impl.YarnClientImpl: Submitted application application_1569582055993_48877
20/01/09 02:27:39 INFO mapreduce.Job: The url to track the job: http://tpacoslmpp050.enterprisenet.org:8088/proxy/application_1569582055993_48877/
20/01/09 02:27:39 INFO mapreduce.Job: Running job: job_1569582055993_48877
20/01/09 02:27:45 INFO mapreduce.Job: Job job_1569582055993_48877 running in uber mode : false
20/01/09 02:27:45 INFO mapreduce.Job: map 0% reduce 0%
20/01/09 02:27:55 INFO mapreduce.Job: map 2% reduce 0%

 

Actually i have removed YARN service and re-added and re-created Jobhistory directory and nodemanager logs directory through Cloudera Manager.

We are using Cloudera Express 5.16.2 and CDH 5.4.10, Parcels.

 

Now also facing same issue, After re-adding the YARN to the environment.

 

Here we want to launch the jobs in YARN mode not local mode.

Please do the needful.

 

Thanks,

Vinod

 

Highlighted

Re: Yarn is not stable and getting SIGTERM 15 and connection refused errors in resource manager and job history server

Guru
Hi Vinod,

If your YARN jobs are running in local mode, then it means your Yarn Gateway role is not installed properly. If you go to CM > Yarn > Instances, can you confirm that this edgenode is on the list and have Gateway role assigned?

What about the HDFS, do you have the same issue? So if you run "hdfs" command, does it list files in HDFS or local?

Cheers
Eric
Highlighted

Re: Yarn is not stable and getting SIGTERM 15 and connection refused errors in resource manager and job history server

Explorer

Hi @EricL ,

 

Thanks for your response,

 

Yes I can see Gateway roles have been assigned to the edgenode.

Yes, I can able to list the files from hdfs using hdfs commands.

Only YARN having the issues and we are able to access HBase Shell, Hive Shell, HDFS but Yarn we cant able to.

 

I have ran below sample mapreduce job,

 

yarn jar /opt/cloudera/parcels/CDH/jars/hadoop-examples.jar teragen 500000000 /tmp/test5_teragen0
20/01/09 06:12:26 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-
20/01/09 06:12:26 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
20/01/09 06:12:26 INFO terasort.TeraSort: Generating 500000000 using 1
20/01/09 06:12:26 INFO mapreduce.JobSubmitter: number of splits:1
20/01/09 06:12:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1886679574_0001
20/01/09 06:12:27 INFO mapreduce.Job: The url to track the job: 
20/01/09 06:12:27 INFO mapreduce.Job: Running job: job_local1886679574_0001

 

Which is not created any aplication ID.

 

Please respond me back and give me your suggestions.

 

Best Regards,

Vinod

Highlighted

Re: Yarn is not stable and getting SIGTERM 15 and connection refused errors in resource manager and job history server

Explorer

Hello @EricL 

 

Can you please help us what could be the reason for issue ?

 

Best Regards,

Vinod

Highlighted

Re: Yarn is not stable and getting SIGTERM 15 and connection refused errors in resource manager and job history server

Guru
@kvinod ,

Can you try to run the same command from RM host and see if you also get the local mode or not? This can confirm if your edgenode is not setup correctly.

Also, is it possible to share the yarn-site.xml and mapred-site.xml files? You can redact any host related information.

Thanks
Eric
Don't have an account?
Coming from Hortonworks? Activate your account here