Member since
06-25-2017
29
Posts
2
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2396 | 08-06-2020 10:56 AM | |
3049 | 10-20-2019 07:10 PM | |
9524 | 06-28-2017 08:53 AM |
08-15-2020
04:28 AM
Thanks @Prakashcit If I am not wrong to enter a support case we should have license with Cloudera. Currently this is our test cluster and we don't have a license/subscription with Cloudera. Thanks for your help @Prakashcit
... View more
08-13-2020
01:54 PM
Thanks @Prakashcit for pointing me to the Hive JIRA related to this bug. I see that Fixed version is 4.0.0 does it mean I can't avail in CDH 6.2 (Hive 2.1), any idea? Could you provide any guidance how to apply this patch to the version of Hive I am running? Thanks in advance!
... View more
08-09-2020
03:38 PM
Hi All, We've recently started using CDH 6.2 and hive-2.1.1. In Our existing jobs ( in old cluster ) we've set of properties that we set by default. In that one of the property we use is `hive.exec.parallel=true`. In the new ( hive-2.1.1/ CDH 6.2.1) cluster when we run jobs with `hive.exec.parallel=true` beeline is not writing any console logs, which shows the status of the job, application URL and other info like below INFO : Running with YARN Application = application_1596901153465_0484
INFO : Kill Command = /opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop/bin/yarn application -kill application_1596901153465_0484
INFO : Hive on Spark Session Web UI URL: http://us-east-1a-test-east-cdh-tasknode670.throtle-test.internal:45956
INFO :
Query Hive on Spark job[0] stages: [0, 1, 2]
INFO : Spark job[0] status = RUNNING
INFO : Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount
INFO : 2020-08-09 04:28:23,096 Stage-0_0: 0(+1)/3 Stage-1_0: 0/64 Stage-2_0: 0/64
INFO : 2020-08-09 04:28:26,106 Stage-0_0: 0(+1)/3 Stage-1_0: 0/64 Stage-2_0: 0/64
INFO : 2020-08-09 04:28:27,110 Stage-0_0: 0(+3)/3 Stage-1_0: 0/64 Stage-2_0: 0/64 I am able to get these logs **only if I disable** `hive.exec.parallel=true` Can any one help me in getting console logs even with hive parallel execution?
... View more
Labels:
- Labels:
-
Apache Hive
08-06-2020
10:56 AM
1 Kudo
All, I was able to fix this issue by changing permissions to 755 on directories /usr/lib64/python2.7/site-packages; and /usr/lib/python2.7/site-packages in the server that's running Hue service.
... View more
08-05-2020
12:55 PM
Hi All, We're trying to install new cluster with Cloudera Manager 6.2.0 ( CDH 6.2.1 ), while adding Hue service, I am getting an error as " Unexpected error. Unable to verify database connection." in Cloudera Manager. I checked cloudera manager server logs in /var/log/cloudera-scm-server/cloudera-scm-server.log Here is the log I found at the time Hue test connection failure 2020-08-05 11:49:55,499 INFO scm-web-436:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult
2020-08-05 11:49:55,506 INFO scm-web-436:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult,
Status:200
2020-08-05 11:49:57,541 INFO scm-web-370:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult
2020-08-05 11:49:57,548 INFO scm-web-370:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult,
Status:200
2020-08-05 11:49:59,588 INFO scm-web-436:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult
2020-08-05 11:49:59,596 INFO scm-web-436:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult,
Status:200
2020-08-05 11:50:01,634 INFO scm-web-370:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult
2020-08-05 11:50:01,642 INFO scm-web-370:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult,
Status:200
2020-08-05 11:50:02,044 INFO CommandPusher:com.cloudera.cmf.service.AbstractOneOffHostCommand: Unsuccessful 'HueTestDatabaseConnection'
2020-08-05 11:50:02,047 INFO CommandPusher:com.cloudera.cmf.service.AbstractDbConnectionTestCommand: Command exited with code: 1
2020-08-05 11:50:02,047 INFO CommandPusher:com.cloudera.cmf.service.AbstractDbConnectionTestCommand: + '[' syncdb = is_db_alive ']'
+ '[' ldaptest = is_db_alive ']'
+ exec /opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hue/build/env/bin/hue is_db_alive
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hue/build/env/bin/hue", line 9, in <module>
from pkg_resources import load_entry_point
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hue/build/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3250, in <module>
@_call_aside
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hue/build/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3234, in _call_asi
de
f(*args, **kwargs)
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hue/build/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3263, in _initiali
ze_master_working_set
working_set = WorkingSet._build_master()
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hue/build/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 574, in _build_mas
ter
ws = cls()
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hue/build/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 567, in __init__
self.add_entry(entry)
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hue/build/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 623, in add_entry
for dist in find_distributions(entry, True):
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hue/build/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2065, in find_on_p
ath
for dist in factory(fullpath):
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hue/build/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2127, in distribut
ions_from_metadata
if len(os.listdir(path)) == 0:
OSError: [Errno 13] Permission denied: '/usr/lib64/python2.7/site-packages/simplejson-3.17.2.dist-info' OSError: [Errno 13] Permission denied: '/usr/lib64/python2.7/site-packages/simplejson-3.17.2.dist-info' Just to check if it's really a permission issue, I've changed permission on the above directory to 777 and still it was failing with same issue. Please note that Test db connection for Hive and Oozie working fine, only Hue is not working. OS : CentOS 7.8; Python : 2.7.5
... View more
Labels:
- Labels:
-
Cloudera Hue
07-24-2020
10:14 AM
Hi @Bender As provided in the link I tried to produce thread & dump files from the running process, but as I mentioned earlier those process were getting killed/throwing error. Here is the output am getting when I run jmap as per the doc/link provided [yarn@us-east-1a-test-east-cdh-tasknode5152 process]$ ps -fe | grep nodemanager
yarn 5235 12503 0 13:07 ? 00:00:00 /usr/lib/jvm/java-openjdk/bin/java -Dproc_nodemanager -Xmx1000m -Djava.net.preferIPv4Stack=true -server -Xms1073741824 -Xmx1073741824 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -Dlibrary.leveldbjni.path=/run/cloudera-scm-agent/process/109-yarn-NODEMANAGER -Dhadoop.event.appender=,EventCatcher -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/CD-YARN-QafZaOEK_CD-YARN-QafZaOEK-NODEMANAGER-4756e03a64cd1a4e535550d4cd740b08_pid5235.hprof -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=hadoop-cmf-CD-YARN-QafZaOEK-NODEMANAGER-us-east-1a-test-east-cdh-tasknode5152.throtle-test.internal.log.out -Dyarn.log.file=hadoop-cmf-CD-YARN-QafZaOEK-NODEMANAGER-us-east-1a-test-east-cdh-tasknode5152.throtle-test.internal.log.out -Dyarn.home.dir=/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn -Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop/lib/native -classpath /run/cloudera-scm-agent/process/109-yarn-NODEMANAGER:/run/cloudera-scm-agent/process/109-yarn-NODEMANAGER:/run/cloudera-scm-agent/process/109-yarn-NODEMANAGER:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop/.//*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-hdfs/./:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn/.//*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-mapreduce/lib/*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-mapreduce/.//*:/usr/share/cmf/lib/plugins/event-publish-5.16.2-shaded.jar:/usr/share/cmf/lib/plugins/tt-instrumentation-5.16.2.jar:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn/.//*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn/lib/*:/run/cloudera-scm-agent/process/109-yarn-NODEMANAGER/nm-config/log4j.properties org.apache.hadoop.yarn.server.nodemanager.NodeManager
yarn 5240 5235 0 13:07 ? 00:00:00 python2.7 /usr/lib64/cmf/agent/build/env/bin/cmf-redactor /usr/lib64/cmf/service/yarn/yarn.sh nodemanager
yarn 5487 31141 0 13:07 pts/0 00:00:00 grep --color=auto nodemanager
[yarn@us-east-1a-test-east-cdh-tasknode5152 process]$
[yarn@us-east-1a-test-east-cdh-tasknode5152 process]$
[yarn@us-east-1a-test-east-cdh-tasknode5152 process]$ /usr/lib/jvm/java-openjdk/bin/jmap -heap 5235 > /tmp/jmap_5235_heap.out
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.tools.jmap.JMap.runTool(JMap.java:201)
at sun.tools.jmap.JMap.main(JMap.java:130)
Caused by: java.lang.NullPointerException
at sun.jvm.hotspot.tools.HeapSummary.run(HeapSummary.java:157)
at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:223)
at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
at sun.jvm.hotspot.tools.HeapSummary.main(HeapSummary.java:50)
... 6 more Here is the error message I am getting when I run jstack with -l option [yarn@us-east-1a-test-east-cdh-tasknode5152 process]$ ps -fe | grep nodemanager
yarn 4518 12503 0 13:04 ? 00:00:00 /usr/lib/jvm/java-openjdk/bin/java -Dproc_nodemanager -Xmx1000m -Djava.net.preferIPv4Stack=true -server -Xms1073741824 -Xmx1073741824 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -Dlibrary.leveldbjni.path=/run/cloudera-scm-agent/process/109-yarn-NODEMANAGER -Dhadoop.event.appender=,EventCatcher -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/CD-YARN-QafZaOEK_CD-YARN-QafZaOEK-NODEMANAGER-4756e03a64cd1a4e535550d4cd740b08_pid4518.hprof -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=hadoop-cmf-CD-YARN-QafZaOEK-NODEMANAGER-us-east-1a-test-east-cdh-tasknode5152.throtle-test.internal.log.out -Dyarn.log.file=hadoop-cmf-CD-YARN-QafZaOEK-NODEMANAGER-us-east-1a-test-east-cdh-tasknode5152.throtle-test.internal.log.out -Dyarn.home.dir=/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn -Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop/lib/native -classpath /run/cloudera-scm-agent/process/109-yarn-NODEMANAGER:/run/cloudera-scm-agent/process/109-yarn-NODEMANAGER:/run/cloudera-scm-agent/process/109-yarn-NODEMANAGER:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop/.//*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-hdfs/./:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn/.//*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-mapreduce/lib/*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-mapreduce/.//*:/usr/share/cmf/lib/plugins/event-publish-5.16.2-shaded.jar:/usr/share/cmf/lib/plugins/tt-instrumentation-5.16.2.jar:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn/.//*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn/lib/*:/run/cloudera-scm-agent/process/109-yarn-NODEMANAGER/nm-config/log4j.properties org.apache.hadoop.yarn.server.nodemanager.NodeManager
yarn 4523 4518 0 13:04 ? 00:00:00 python2.7 /usr/lib64/cmf/agent/build/env/bin/cmf-redactor /usr/lib64/cmf/service/yarn/yarn.sh nodemanager
yarn 4717 31141 0 13:05 pts/0 00:00:00 grep --color=auto nodemanager
[yarn@us-east-1a-test-east-cdh-tasknode5152 process]$
[yarn@us-east-1a-test-east-cdh-tasknode5152 process]$
[yarn@us-east-1a-test-east-cdh-tasknode5152 process]$
[yarn@us-east-1a-test-east-cdh-tasknode5152 process]$ /usr/lib/jvm/java-openjdk/bin/jstack -F 4518 > /tmp/jstack_4518_f.out
[yarn@us-east-1a-test-east-cdh-tasknode5152 process]$ less /tmp/jstack_4518_f.out
[yarn@us-east-1a-test-east-cdh-tasknode5152 process]$
[yarn@us-east-1a-test-east-cdh-tasknode5152 process]$
[yarn@us-east-1a-test-east-cdh-tasknode5152 process]$
[yarn@us-east-1a-test-east-cdh-tasknode5152 process]$ /usr/lib/jvm/java-openjdk/bin/jstack -l 4518 > /tmp/jstack_4518_f.out
4518: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding For the process 4518 I ran jstack with -F option and here is the output Debugger attached successfully.
Server compiler detected.
JVM version is 25.181-b13
Deadlock Detection:
No deadlocks found.
... View more
07-23-2020
08:47 AM
Thanks for providing the link for JVM analysis. it's really helpful. Yes, in my case the process was not responding so I should've used `kill -3` option. I will try next time and provide you results.
... View more
07-23-2020
04:10 AM
Hi @Bender Yes we checked ports and verified that they're accessible., and yes it's very strange to see issue like this. I was not able to do anything since the process ( yarn NODEMANAGER ) will be running but does nothing like not even writing logs. In some cases reboot of the server works and some cases service restart works. But in some cases even server reboot doesn't work. I have tried following options Tried changing log level to DEBUG and see if that writes any logs -- But it didn't work or didn't write any log Since the Java process is running ( supervisor process thinks that NM was running since process is running and stayed up > 20 sec ) , I thought of getting thread dump analyze if I could be able to get any thing. To get thread dump I used jcmd, but after running this it's killing the process ( that was not running/ not producing logs ) and a new process spun up. ( even if I do the same on new process it's killing again ) jcmd <pid> Thread.print >> /path/to/file I have tried to see if there were any deadlocks with jstack -F and the result it showed is there were no deadlocks Please let me know if I can check anything else to resolve the issue.
... View more
07-21-2020
12:45 PM
Thanks @Bender for detailed instructions. Yes I am still facing this issue and trying to chase this down for last few days. I have tried the commands you've provided and here is the output [root@us-east-1a-test-east-cdh-corenode4163 cloudera-scm-agent]# export SUPER_CONF=/var/run/cloudera-scm-agent/supervisor/supervisord.conf
[root@us-east-1a-test-east-cdh-corenode4163 cloudera-scm-agent]# /usr/lib64/cmf/agent/build/env/bin/supervisorctl -c $SUPER_CONF status
138-cluster-host-inspector EXITED Jul 21 03:00 PM
201-hdfs-DATANODE RUNNING pid 96936, uptime 0:11:58
209-hbase-REGIONSERVER RUNNING pid 97227, uptime 0:11:31
214-yarn-NODEMANAGER RUNNING pid 97307, uptime 0:11:30
cmflistener RUNNING pid 7412, uptime 4:53:45
flood RUNNING pid 7449, uptime 4:53:44
[root@us-east-1a-test-east-cdh-corenode4163 cloudera-scm-agent]# So in my case I don't have problem of two NM instances running in a single node. Also here is the process output from ps command yarn 97307 0.0 0.0 2858224 23704 ? Sl 15:22 0:00 /usr/lib/jvm/java-openjdk/bin/java -Dproc_nodemanager -Xmx1000m -Djava.net.preferIPv4Stack=true -server -Xms1073741824 -Xmx1073741824 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -Dlibrary.leveldbjni.path=/run/cloudera-scm-agent/process/214-yarn-NODEMANAGER -Dhadoop.event.appender=,EventCatcher -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/CD-YARN-LGzbJezU_CD-YARN-LGzbJezU-NODEMANAGER-f953f0c79fd5345f10fb347aa90e7500_pid97307.hprof -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=hadoop-cmf-CD-YARN-LGzbJezU-NODEMANAGER-us-east-1a-test-east-cdh-corenode4163.throtle-test.internal.log.out -Dyarn.log.file=hadoop-cmf-CD-YARN-LGzbJezU-NODEMANAGER-us-east-1a-test-east-cdh-corenode4163.throtle-test.internal.log.out -Dyarn.home.dir=/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn -Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop/lib/native -classpath /run/cloudera-scm-agent/process/214-yarn-NODEMANAGER:/run/cloudera-scm-agent/process/214-yarn-NODEMANAGER:/run/cloudera-scm-agent/process/214-yarn-NODEMANAGER:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop/.//*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-hdfs/./:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn/.//*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-mapreduce/lib/*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-mapreduce/.//*:/usr/share/cmf/lib/plugins/event-publish-5.16.2-shaded.jar:/usr/share/cmf/lib/plugins/tt-instrumentation-5.16.2.jar:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn/.//*:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop-yarn/lib/*:/run/cloudera-scm-agent/process/214-yarn-NODEMANAGER/nm-config/log4j.properties org.apache.hadoop.yarn.server.nodemanager.NodeManager Yes, the NodeManager by default given 1 GB , and I even tried giving 2 GB & 4 GB still the issue persists. Please note that I am seeing this issue on random servers, for every restart of cluster/YARN service. ( As mentioned earlier the process will be running but no logs are written, just cloudera-scm-agent complains that it's not able to connect to 8042 port ) Could you provide any other ways/method to debug this issue? Note : Not sure if it helps, when we tested cluster with small nodes ( r4.xlarge ) we're not seeing this issue. We are seeing this issue when we increased size of the node to r4.8xlarge
... View more
07-20-2020
05:42 AM
A quick update, Even though I was not able to find the root cause, or fix for this root cause, I just tried rebooting the server running NodeManager service, and now I could see that NodeManager was up and running. But for every service restart I see the same issue of NodeManager not running and had to reboot server to get it up and running.
... View more