Created on 08-24-2017 03:22 PM - edited 09-16-2022 05:08 AM
OS:
Distributor ID: Ubuntu Description: Ubuntu 16.04.3 LTS Release: 16.04 Codename: xenial
Kernel:
Linux ThinkPad-T510 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
IPv6 disabled:
GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1"
My hosts file is:
127.0.0.1 localhost.localdomain localhost 127.0.0.1 ThinkPad-T510.localdomain ThinkPad-T510 # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters
My resolv.conf
nameserver 127.0.1.1 search fios-router.home
My /etc/nsswitch.conf
passwd: compat group: compat shadow: compat gshadow: files hosts: files mdns4_minimal [NOTFOUND=return] dns networks: files protocols: db files services: db files ethers: db files rpc: db files netgroup: nis
yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.dispatcher.exit-on-error</name> <value>true</value> </property> <property> <description>List of directories to store localized files in.</description> <name>yarn.nodemanager.local-dirs</name> <value>/var/lib/hadoop-yarn/cache/${user.name}/nm-local-dir</value> </property> <property> <description>Where to store container logs.</description> <name>yarn.nodemanager.log-dirs</name> <value>/var/log/hadoop-yarn/containers</value> </property> <property> <description>Where to aggregate logs to.</description> <name>yarn.nodemanager.remote-app-log-dir</name> <value>hdfs://localhost:8020/var/log/hadoop-yarn/apps</value> </property> <property> <description>Classpath for typical applications.</description> <name>yarn.application.classpath</name> <value> $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*, $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*, $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*, $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/* </value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>128</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>2048</value> </property> <property> <name>yarn.scheduler.increment-allocation-mb</name> <value>128</value> </property> <property> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>1</value> </property> <property> <name>yarn.scheduler.maximun-allocation-vcores</name> <value>3</value> </property> <property> <name>yarn.scheduler.increment-allocation-vcores</name> <value>1</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>3</value> </property> <property> <name>mapreduce.map.memory.mb</name> <value>256</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>256</value> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx256m</value> </property> </configuration>
My mapred-site.xml:
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:8021</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>localhost:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>localhost:19888</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name> <value>/user</value> </property> <property> <description>To set the value of tmp directory for map and reduce tasks.</description> <name>mapreduce.task.tmp.dir</name> <value>/var/lib/hadoop-mapreduce/cache/${user.name}/tasks</value> </property> </configuration>
After some time(600 secs) i see in logs this error, seems some env or network misconfiguration issue, may be somebody is also facing with that issue.
2017-08-23 16:48:05,001 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:localhost.localdomain:38301 Timed out after 600 secs 2017-08-23 16:48:05,002 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node localhost.localdomain:38301 as it is now LOST 2017-08-23 16:48:05,002 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: localhost.localdomain:38301 Node Transitioned from RUNNING to LOST
At the same time in NN logs
2017-08-23 16:16:33,249 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as localhost.localdomain:38301 with total resource of <memory:2048, vCores:3>
Created 09-04-2017 11:20 AM
Thanks for helping, solved this issue after some debugging, solution which helps me on Ubuntu 16.04 as a host operation system
[14:15:59][0][eseliavka@ThinkPad-T510:/tmp]$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.3 LTS Release: 16.04 Codename: xenial
add this to the NetworkManager configuration
[14:15:48][0][eseliavka@ThinkPad-T510:/tmp]$ cat /etc/NetworkManager/dnsmasq.d/hosts.conf addn-hosts=/etc/hosts
restart the NetworkManager with the command
[14:16:39][0][eseliavka@ThinkPad-T510:/tmp]$ sudo /etc/init.d/network-manager restart
add these 2 lines to the yarn-site.xml
<property> <name>yarn.resourcemanager.hostname</name> <value>localhost</value> </property> <property> <name>yarn.nodemanager.hostname</name> <value>localhost</value> </property>
Created 08-28-2017 01:37 AM
could you share the nodemager logs
meantime it is always a good idea to have the host file to look something like this
192.168.121.14 server1
/etc/hostname
server1
meantime let me know whats your cloudera manager ini file look like
/etc/cloudera-scm-agent/config.ini
Created 08-30-2017 04:02 PM
Unfortenatly i do not use CM in pseudo-distributed mode. This is the fresh logs of the NM:
After restart NM works as expected 600 sec then RM mark him as LOST.
Created 08-31-2017 05:54 AM
Could you please share the logs using the "Insert code " tool in the toolbar . beside the bold, italic, underline spoiler tag in the reply text area.
Created 08-31-2017 07:10 AM
Sure, that is it
2017-08-31 10:06:52,126 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NodeManager STARTUP_MSG: user = yarn STARTUP_MSG: host = ThinkPad-T510.localdomain/127.0.1.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 2.6.0-cdh5.12.0 STARTUP_MSG: classpath = /etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:..... STARTUP_MSG: build = http://github.com/cloudera/hadoop -r dba647c5a8bc5e09b572d76a8d29481c78d1a0dd ; compiled by 'jenkins' on 2017-06-29T11:35Z STARTUP_MSG: java = 1.8.0_144 ************************************************************/ 2017-08-31 10:06:52,150 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: registered UNIX signal handlers for [TERM, HUP, INT] 2017-08-31 10:06:53,895 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher 2017-08-31 10:06:53,898 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher 2017-08-31 10:06:53,899 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizationEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService 2017-08-31 10:06:53,900 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServicesEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices 2017-08-31 10:06:53,901 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl 2017-08-31 10:06:53,902 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher 2017-08-31 10:06:53,953 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.ContainerManagerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl 2017-08-31 10:06:53,954 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.NodeManagerEventType for class org.apache.hadoop.yarn.server.nodemanager.NodeManager 2017-08-31 10:06:54,036 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2017-08-31 10:06:54,176 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2017-08-31 10:06:54,176 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system started 2017-08-31 10:06:54,218 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.event.LogHandlerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService 2017-08-31 10:06:54,218 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: per directory file limit = 8192 2017-08-31 10:06:54,285 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: usercache path : file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache_DEL_1504188414223 2017-08-31 10:06:54,295 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache_DEL_1504188414223/eseliavka 2017-08-31 10:06:54,430 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker 2017-08-31 10:06:54,481 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for class class org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config. 2017-08-31 10:06:54,481 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Adding auxiliary service httpshuffle, "mapreduce_shuffle" 2017-08-31 10:06:54,594 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@4e07b95f 2017-08-31 10:06:54,594 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Using ResourceCalculatorProcessTree : null 2017-08-31 10:06:54,594 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Physical memory check enabled: true 2017-08-31 10:06:54,594 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Virtual memory check enabled: false 2017-08-31 10:06:54,617 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized nodemanager for null: physical-memory=2048 virtual-memory=4301 virtual-cores=3 2017-08-31 10:06:54,629 INFO org.apache.hadoop.util.NodeHealthScriptRunner: health status being set as 2017-08-31 10:06:54,629 INFO org.apache.hadoop.util.NodeHealthScriptRunner: health status being set as 2017-08-31 10:06:54,698 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 2000 2017-08-31 10:06:54,739 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 39979 2017-08-31 10:06:55,102 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ContainerManagementProtocolPB to the server 2017-08-31 10:06:55,102 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Blocking new container-requests as container manager rpc server is still starting. 2017-08-31 10:06:55,103 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2017-08-31 10:06:55,103 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 39979: starting 2017-08-31 10:06:55,121 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Updating node address : ThinkPad-T510.localdomain:39979 2017-08-31 10:06:55,143 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 500 2017-08-31 10:06:55,144 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8040 2017-08-31 10:06:55,148 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB to the server 2017-08-31 10:06:55,149 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2017-08-31 10:06:55,149 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8040: starting 2017-08-31 10:06:55,150 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localizer started on port 8040 2017-08-31 10:06:55,212 INFO org.apache.hadoop.mapred.IndexCache: IndexCache created with max memory = 10485760 2017-08-31 10:06:55,232 INFO org.apache.hadoop.mapred.ShuffleHandler: httpshuffle listening on port 13562 2017-08-31 10:06:55,234 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager started at ThinkPad-T510.localdomain/127.0.1.1:39979 2017-08-31 10:06:55,234 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager bound to 0.0.0.0/0.0.0.0:0 2017-08-31 10:06:55,236 INFO org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating NMWebApp at 0.0.0.0:8042 2017-08-31 10:06:55,347 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2017-08-31 10:06:55,362 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets. 2017-08-31 10:06:55,374 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.nodemanager is not defined 2017-08-31 10:06:55,412 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) 2017-08-31 10:06:55,418 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context node 2017-08-31 10:06:55,418 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 2017-08-31 10:06:55,418 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 2017-08-31 10:06:55,431 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /node/* 2017-08-31 10:06:55,431 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /ws/* 2017-08-31 10:06:55,453 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 8042 2017-08-31 10:06:55,453 INFO org.mortbay.log: jetty-6.1.26.cloudera.4 2017-08-31 10:06:55,645 INFO org.mortbay.log: Extract jar:file:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.6.0-cdh5.12.0.jar!/webapps/node to /tmp/Jetty_0_0_0_0_8042_node____19tj0x/webapp 2017-08-31 10:06:56,217 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042 2017-08-31 10:06:56,217 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 8042 2017-08-31 10:06:57,079 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules 2017-08-31 10:06:57,091 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor 2017-08-31 10:06:57,116 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8031 2017-08-31 10:06:57,179 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 0 NM container statuses: [] 2017-08-31 10:06:57,199 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[] 2017-08-31 10:06:57,610 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id -268042821 2017-08-31 10:06:57,621 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for container-tokens, got key with id -1995863860 2017-08-31 10:06:57,622 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as ThinkPad-T510.localdomain:39979 with total resource of <memory:2048, vCores:3> 2017-08-31 10:06:57,622 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests
Created on 09-02-2017 06:47 PM - edited 09-02-2017 06:48 PM
Change your host file / hostname
192.168.200.21 server1 # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters
also change the hostname i am not sure the path in Ubuntu it should be under
/etc/hostname -> server1
restart the network
Also do an echo $hostname or $HOSTNAME to see if it is reflecting
Finally restart all the dameons .
let me know if that helps
Created 09-04-2017 11:20 AM
Thanks for helping, solved this issue after some debugging, solution which helps me on Ubuntu 16.04 as a host operation system
[14:15:59][0][eseliavka@ThinkPad-T510:/tmp]$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.3 LTS Release: 16.04 Codename: xenial
add this to the NetworkManager configuration
[14:15:48][0][eseliavka@ThinkPad-T510:/tmp]$ cat /etc/NetworkManager/dnsmasq.d/hosts.conf addn-hosts=/etc/hosts
restart the NetworkManager with the command
[14:16:39][0][eseliavka@ThinkPad-T510:/tmp]$ sudo /etc/init.d/network-manager restart
add these 2 lines to the yarn-site.xml
<property> <name>yarn.resourcemanager.hostname</name> <value>localhost</value> </property> <property> <name>yarn.nodemanager.hostname</name> <value>localhost</value> </property>
Created 09-04-2017 05:39 PM
I am glad , did you fix your ip /hostname in /etc/hosts file ?
Created 09-05-2017 07:09 AM
Thanks for pointing on this, forgot about that detail, i specified fqdn in hostname also
[10:07:12][0][eseliavka@ThinkPad-T510:~]$ cat /etc/hostname ThinkPad-T510.localdomain
according to my /etc/hosts file
[09:58:12][0][eseliavka@ThinkPad-T510:~]$ cat /etc/hosts 127.0.0.1 localhost.localdomain localhost 127.0.1.1 ThinkPad-T510.localdomain ThinkPad-T510 # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters