Support Questions

Find answers, ask questions, and share your expertise

cdh5.12.0 pseudo distributed mode nodemanager lost

avatar
Explorer

 

OS:

 

Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.3 LTS
Release:	16.04
Codename:	xenial

 

Kernel:

 

Linux ThinkPad-T510 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

 

IPv6 disabled:

 

GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1"

 

My hosts file is:

 

127.0.0.1 localhost.localdomain localhost
127.0.0.1 ThinkPad-T510.localdomain ThinkPad-T510

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

 

My resolv.conf

 

nameserver 127.0.1.1
search fios-router.home

 

My /etc/nsswitch.conf

passwd:         compat
group:          compat
shadow:         compat
gshadow:        files

hosts:          files mdns4_minimal [NOTFOUND=return] dns
networks:       files

protocols:      db files
services:       db files
ethers:         db files
rpc:            db files

netgroup:       nis

yarn-site.xml

<configuration>

  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>

  <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>

  <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>

  <property>
    <name>yarn.dispatcher.exit-on-error</name>
    <value>true</value>
  </property>

  <property>
    <description>List of directories to store localized files in.</description>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/var/lib/hadoop-yarn/cache/${user.name}/nm-local-dir</value>
  </property>

  <property>
    <description>Where to store container logs.</description>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/var/log/hadoop-yarn/containers</value>
  </property>

  <property>
    <description>Where to aggregate logs to.</description>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>hdfs://localhost:8020/var/log/hadoop-yarn/apps</value>
  </property>

  <property>
    <description>Classpath for typical applications.</description>
     <name>yarn.application.classpath</name>
     <value>
        $HADOOP_CONF_DIR,
        $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
        $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
        $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
        $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
     </value>
  </property>

  <property>
      <name>yarn.scheduler.minimum-allocation-mb</name>
      <value>128</value>
  </property>

  <property>
      <name>yarn.scheduler.maximum-allocation-mb</name>
      <value>2048</value>
  </property>

  <property>
      <name>yarn.scheduler.increment-allocation-mb</name>
      <value>128</value>
  </property>

  <property>
      <name>yarn.scheduler.minimum-allocation-vcores</name>
      <value>1</value>
  </property>

  <property>
      <name>yarn.scheduler.maximun-allocation-vcores</name>
      <value>3</value>
  </property>

  <property>
      <name>yarn.scheduler.increment-allocation-vcores</name>
      <value>1</value>
  </property>

  <property>
      <name>yarn.nodemanager.resource.memory-mb</name>
      <value>2048</value>
  </property>

  <property>
      <name>yarn.nodemanager.resource.cpu-vcores</name>
      <value>3</value>
  </property>

  <property>
      <name>mapreduce.map.memory.mb</name>
      <value>256</value>
  </property>

  <property>
      <name>mapreduce.reduce.memory.mb</name>
      <value>256</value>
  </property>

  <property>
      <name>mapred.child.java.opts</name>
      <value>-Xmx256m</value>
  </property>

</configuration>

My mapred-site.xml:

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:8021</value>
  </property>

  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>

  <property>
    <name>mapreduce.jobhistory.address</name>
    <value>localhost:10020</value>
  </property>

  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>localhost:19888</value>
  </property>

  <property>
    <name>yarn.app.mapreduce.am.staging-dir</name>
    <value>/user</value>
  </property>

  <property>
    <description>To set the value of tmp directory for map and reduce tasks.</description>
    <name>mapreduce.task.tmp.dir</name>
    <value>/var/lib/hadoop-mapreduce/cache/${user.name}/tasks</value>
  </property>

</configuration>

After some time(600 secs) i see in logs this error, seems some env or network misconfiguration issue, may be somebody  is also facing with that issue.

2017-08-23 16:48:05,001 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:localhost.localdomain:38301 Timed out after 600 secs
2017-08-23 16:48:05,002 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node localhost.localdomain:38301 as it is now LOST
2017-08-23 16:48:05,002 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: localhost.localdomain:38301 Node Transitioned from RUNNING to LOST

 At the same time in NN logs

2017-08-23 16:16:33,249 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as localhost.localdomain:38301 with total resource
 of <memory:2048, vCores:3>

 

1 ACCEPTED SOLUTION

avatar
Explorer

Thanks for helping, solved this issue after some debugging, solution which helps me on Ubuntu 16.04 as a host operation system

[14:15:59][0][eseliavka@ThinkPad-T510:/tmp]$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.3 LTS
Release:	16.04
Codename:	xenial

add this to the NetworkManager configuration

[14:15:48][0][eseliavka@ThinkPad-T510:/tmp]$ cat /etc/NetworkManager/dnsmasq.d/hosts.conf 
addn-hosts=/etc/hosts

restart the NetworkManager with the command

[14:16:39][0][eseliavka@ThinkPad-T510:/tmp]$ sudo /etc/init.d/network-manager restart

add these 2 lines to the yarn-site.xml

  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>localhost</value>
  </property>

  <property>
    <name>yarn.nodemanager.hostname</name>
    <value>localhost</value>
  </property>

View solution in original post

8 REPLIES 8

avatar
Champion

could you share the nodemager logs 

meantime it is always a good idea to have the host file to look something like this 

192.168.121.14       server1

/etc/hostname 

server1

meantime let me know whats your cloudera manager ini file look like 

 /etc/cloudera-scm-agent/config.ini

avatar
Explorer

Unfortenatly i do not use CM in pseudo-distributed mode. This is the fresh logs of the NM:

NM_logs

After restart NM works as expected 600 sec then RM mark him as LOST.

avatar
Champion

Could you please share the logs using the "Insert code " tool in the toolbar . beside the bold, italic, underline spoiler tag in the reply text area.

avatar
Explorer

Sure, that is it

2017-08-31 10:06:52,126 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NodeManager
STARTUP_MSG:   user = yarn
STARTUP_MSG:   host = ThinkPad-T510.localdomain/127.0.1.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.6.0-cdh5.12.0
STARTUP_MSG:   classpath = /etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:.....
STARTUP_MSG:   build = http://github.com/cloudera/hadoop -r dba647c5a8bc5e09b572d76a8d29481c78d1a0dd ; compiled by 'jenkins' on 2017-06-29T11:35Z
STARTUP_MSG:   java = 1.8.0_144
************************************************************/
2017-08-31 10:06:52,150 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: registered UNIX signal handlers for [TERM, HUP, INT]
2017-08-31 10:06:53,895 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher
2017-08-31 10:06:53,898 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher
2017-08-31 10:06:53,899 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizationEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService
2017-08-31 10:06:53,900 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServicesEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices
2017-08-31 10:06:53,901 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
2017-08-31 10:06:53,902 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher
2017-08-31 10:06:53,953 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.ContainerManagerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
2017-08-31 10:06:53,954 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.NodeManagerEventType for class org.apache.hadoop.yarn.server.nodemanager.NodeManager
2017-08-31 10:06:54,036 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2017-08-31 10:06:54,176 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2017-08-31 10:06:54,176 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system started
2017-08-31 10:06:54,218 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.event.LogHandlerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
2017-08-31 10:06:54,218 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: per directory file limit = 8192
2017-08-31 10:06:54,285 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: usercache path : file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache_DEL_1504188414223
2017-08-31 10:06:54,295 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache_DEL_1504188414223/eseliavka
2017-08-31 10:06:54,430 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker
2017-08-31 10:06:54,481 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for class class org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config.
2017-08-31 10:06:54,481 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Adding auxiliary service httpshuffle, "mapreduce_shuffle"
2017-08-31 10:06:54,594 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@4e07b95f
2017-08-31 10:06:54,594 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:  Using ResourceCalculatorProcessTree : null
2017-08-31 10:06:54,594 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Physical memory check enabled: true
2017-08-31 10:06:54,594 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Virtual memory check enabled: false
2017-08-31 10:06:54,617 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized nodemanager for null: physical-memory=2048 virtual-memory=4301 virtual-cores=3
2017-08-31 10:06:54,629 INFO org.apache.hadoop.util.NodeHealthScriptRunner: health status being set as 
2017-08-31 10:06:54,629 INFO org.apache.hadoop.util.NodeHealthScriptRunner: health status being set as 
2017-08-31 10:06:54,698 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 2000
2017-08-31 10:06:54,739 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 39979
2017-08-31 10:06:55,102 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ContainerManagementProtocolPB to the server
2017-08-31 10:06:55,102 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Blocking new container-requests as container manager rpc server is still starting.
2017-08-31 10:06:55,103 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2017-08-31 10:06:55,103 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 39979: starting
2017-08-31 10:06:55,121 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Updating node address : ThinkPad-T510.localdomain:39979
2017-08-31 10:06:55,143 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 500
2017-08-31 10:06:55,144 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8040
2017-08-31 10:06:55,148 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB to the server
2017-08-31 10:06:55,149 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2017-08-31 10:06:55,149 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8040: starting
2017-08-31 10:06:55,150 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localizer started on port 8040
2017-08-31 10:06:55,212 INFO org.apache.hadoop.mapred.IndexCache: IndexCache created with max memory = 10485760
2017-08-31 10:06:55,232 INFO org.apache.hadoop.mapred.ShuffleHandler: httpshuffle listening on port 13562
2017-08-31 10:06:55,234 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager started at ThinkPad-T510.localdomain/127.0.1.1:39979
2017-08-31 10:06:55,234 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager bound to 0.0.0.0/0.0.0.0:0
2017-08-31 10:06:55,236 INFO org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating NMWebApp at 0.0.0.0:8042
2017-08-31 10:06:55,347 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2017-08-31 10:06:55,362 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2017-08-31 10:06:55,374 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.nodemanager is not defined
2017-08-31 10:06:55,412 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2017-08-31 10:06:55,418 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context node
2017-08-31 10:06:55,418 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2017-08-31 10:06:55,418 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2017-08-31 10:06:55,431 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /node/*
2017-08-31 10:06:55,431 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*
2017-08-31 10:06:55,453 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 8042
2017-08-31 10:06:55,453 INFO org.mortbay.log: jetty-6.1.26.cloudera.4
2017-08-31 10:06:55,645 INFO org.mortbay.log: Extract jar:file:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.6.0-cdh5.12.0.jar!/webapps/node to /tmp/Jetty_0_0_0_0_8042_node____19tj0x/webapp
2017-08-31 10:06:56,217 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042
2017-08-31 10:06:56,217 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 8042
2017-08-31 10:06:57,079 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2017-08-31 10:06:57,091 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor
2017-08-31 10:06:57,116 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8031
2017-08-31 10:06:57,179 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 0 NM container statuses: []
2017-08-31 10:06:57,199 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[]
2017-08-31 10:06:57,610 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id -268042821
2017-08-31 10:06:57,621 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for container-tokens, got key with id -1995863860
2017-08-31 10:06:57,622 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as ThinkPad-T510.localdomain:39979 with total resource of <memory:2048, vCores:3>
2017-08-31 10:06:57,622 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests

avatar
Champion

Change your host file / hostname 

192.168.200.21      server1

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

also change the hostname i am not sure the path in Ubuntu it should be under
/etc/hostname -> server1

restart the network

Also do an echo $hostname or $HOSTNAME to see if it is reflecting 

 

Finally restart all the dameons .  

let me know if that helps

avatar
Explorer

Thanks for helping, solved this issue after some debugging, solution which helps me on Ubuntu 16.04 as a host operation system

[14:15:59][0][eseliavka@ThinkPad-T510:/tmp]$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.3 LTS
Release:	16.04
Codename:	xenial

add this to the NetworkManager configuration

[14:15:48][0][eseliavka@ThinkPad-T510:/tmp]$ cat /etc/NetworkManager/dnsmasq.d/hosts.conf 
addn-hosts=/etc/hosts

restart the NetworkManager with the command

[14:16:39][0][eseliavka@ThinkPad-T510:/tmp]$ sudo /etc/init.d/network-manager restart

add these 2 lines to the yarn-site.xml

  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>localhost</value>
  </property>

  <property>
    <name>yarn.nodemanager.hostname</name>
    <value>localhost</value>
  </property>

avatar
Champion

I am glad , did you fix your ip /hostname in /etc/hosts file ? 

avatar
Explorer

Thanks for pointing on this, forgot about that detail, i specified fqdn in hostname also

[10:07:12][0][eseliavka@ThinkPad-T510:~]$ cat /etc/hostname 
ThinkPad-T510.localdomain

according to my /etc/hosts file

[09:58:12][0][eseliavka@ThinkPad-T510:~]$ cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
127.0.1.1 ThinkPad-T510.localdomain ThinkPad-T510

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters