Member since
04-06-2017
17
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1143 | 03-07-2018 01:46 AM |
05-18-2019
03:29 PM
I am trying to start the HDFS service from Ambari. All the sub components comes up except the Nodemanager service under HDFS I am seeing the below error message in the /var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-<Node_Name>.log 2019-05-17 13:36:03,850 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(217)) - NodeManager metrics system stopped.
2019-05-17 13:36:03,850 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(606)) - NodeManager metrics system shutdown complete.
2019-05-17 13:36:03,851 FATAL nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(549)) - Error starting NodeManager
org.apache.hadoop.service.ServiceStateException: java.net.BindException: Address already in use
at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.serviceInit(AuxiliaryServiceWithCustomClassLoader.java:65)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:162)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:245)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:291)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:546)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:594)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:128)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:504)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1226)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:495)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:480)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:213)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:355)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:464)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at java.lang.Thread.run(Thread.java:748)
2019-05-17 13:36:03,853 INFO nodemanager.NodeManager (LogAdapter.java:info(45)) - SHUTDOWN_MSG: /************************************************************ Observations made No process running the nodemanager port - 0.0.0.0:45454 and 8042 Also no nodemanager process is running - ps -ef | grep nodemanager is returning empty result Any help would be appreciated
... View more
Labels:
03-07-2018
01:46 AM
Thanks @Geoffrey Shelton Okot The issue has been resolved. Again I came to know the importance of /etc/hosts file. It's not the firewall that was blocking the connection rather the process was spawned internal to the instance - meaning none of the other instance could access the process. Zookeeper process looks for the ip address from /etc/hosts file and spawns the process, instead of fetching the ip address it took the loopback address(127.0.0.1) which made sure the outside world cannot access the process. Followed the thread to resolve the issue MeaningOfIPaddressinProcess
... View more
03-07-2018
12:07 AM
@Geoffrey Shelton Okot We have disabled firewall already for all the hosts in the cluster. Also the port for which we are getting connection refused is the one which has the process running internal to the instance - meaning only localhost can access that process. Not sure why we are getting connection refused for a process that is running internal to an instance. Attached the screenshot where the process is internal to 127.0.1.1. Any inputs would be appreciated?
... View more
03-06-2018
11:10 PM
@Geoffrey Shelton Okot Thanks for the response. Zookeeper is running on these ports zookeeper-server1.pngzookeeper-server2.png. Attaching the process screenshots. I am not able to telnet to that port as well from the node where we are seeing the error like telnet host1/host2 3888. Can it be due to the fact that fire wall has been set? But I am able to telnet to the port 2181 - I thought 2181 is the default zookeeper port. Please confirm?
... View more
03-06-2018
07:46 PM
I am trying to bring up a Hortonworks cluster. Below are the services in the cluster that I am trying to install Zookeeper Ambari metrics HDFS YARN MR2 Out the above services I was able to bring up the Zookeeper and Ambari metrics services. But the other services(HDFS, YARN and MR2) are not coming up. Namenode is also not coming up. I am trying to install the cluster in 3 nodes which is HA as well. When I checked the HDFS alerts one of the critical alert was that Zookeeper Failover Controller hasn't been started. After googling I tried to format it using the command hdfs zkfc -formatZK -nonInteractive but getting same error as I am getting the Ambari UI. My feeling is that ZKFC startup is causing the other hadoop services not to start. Below is the error message from the Zookeeper logs 2018-03-06 13:34:20,580 - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383] - Cannot open channel to 3 at election address Host2/ip3-host4:3888
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:795) Attaching the below items Zookeeper log from the namenode Ambari UI log I have been struck with this for the past 2 days. I tried uninstalling and reinstalling the cluster 2 times but still getting the same error. Any inputs would be appreciated.
... View more
Labels:
09-18-2017
07:33 PM
Below is the error thrown. Any help on this regard would be much appreciated. stderr: Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 259, in <module>
HiveMetastore().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 51, in install
self.install_packages(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 567, in install_packages
retry_count=agent_stack_retry_count)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 54, in action_install
self.install_package(package_name, self.resource.use_repos, self.resource.skip_repos)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 49, in install_package
self.checked_call_with_retries(cmd, sudo=True, logoutput=self.get_logoutput())
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 83, in checked_call_with_retries
return self._call_with_retries(cmd, is_checked=True, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 91, in _call_with_retries
code, out = func(cmd, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 71, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 93, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 141, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 294, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install hive2_2_5_6_0_40' returned 1. Error Downloading Packages:
hive2_2_5_6_0_40-jdbc-2.1.0.2.5.6.0-40.el6.noarch: failure: hive2/hive2_2_5_6_0_40-jdbc-2.1.0.2.5.6.0-40.el6.noarch.rpm from HDP-2.5: [Errno 256] No more mirrors to try.
hive2_2_5_6_0_40-2.1.0.2.5.6.0-40.el6.noarch: failure: hive2/hive2_2_5_6_0_40-2.1.0.2.5.6.0-40.el6.noarch.rpm from HDP-2.5: [Errno 256] No more mirrors to try.
tez_hive2_2_5_6_0_40-0.8.4.2.5.6.0-40.el6.noarch: failure: tez_hive2/tez_hive2_2_5_6_0_40-0.8.4.2.5.6.0-40.el6.noarch.rpm from HDP-2.5: [Errno 256] No more mirrors to try.
... View more
Labels:
09-18-2017
07:25 PM
hi @Sonu Sahi : After reseting the Ambari I was able to get back the installation screen again. Thanks
... View more
09-15-2017
03:54 PM
ambari-step9-blank.jpg
... View more
Labels:
05-09-2017
05:02 PM
@Jay SenSharma: Thanks for the quick response. I am able to do telnet and ping from the client machine The iptables and ip6tables were not turned off. I have already tried turning it off and it didn't work. now I have turned it off but still getting the issue. Also I am attaching the agent logs after restarting the ambari-server with iptables turned off: agentlog01.txt
... View more
05-09-2017
03:35 PM
Hi @Jay SenSharma: Below are the details of executing the commands, ps -ef | grep `cat /var/run/ambari-server/ambari-server.pid` root 22996 1 0 May08 ? 00:07:03 /usr/jdk64/jdk1.8.0_60/bin/java -server -XX:NewRatio=3 -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit -XX:CMSInitiatingOccupancyFraction=60 -Dsun.zip.disableMemoryMapping=true -Xms512m -Xmx2048m -Djava.security.auth.login.config=/etc/ambari-server/conf/krb5JAASLogin.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false -cp /etc/ambari-server/conf:/usr/lib/ambari-server/*:/usr/share/java/postgresql-jdbc.jar org.apache.ambari.server.controller.AmbariServer root 25774 21782 0 08:02 pts/0 00:00:00 grep 22996 netstat -tnlpa | grep `cat /var/run/ambari-server/ambari-server.pid` tcp 0 0 :::8080 :::* LISTEN 22996/java tcp 0 0 :::8440 :::* LISTEN 22996/java tcp 0 0 :::8441 :::* LISTEN 22996/java tcp 0 0 ::ffff:xx.zz.aaa.yyy:8441 ::ffff:xx.zz.aaa.yyy:53556 ESTABLISHED 22996/java tcp 0 0 ::ffff:127.0.0.1:44955 ::ffff:127.0.0.1:5432 ESTABLISHED 22996/java tcp 0 0 ::ffff:127.0.0.1:44926 ::ffff:127.0.0.1:5432 ESTABLISHED 22996/java tcp 0 0 ::ffff:127.0.0.1:44923 ::ffff:127.0.0.1:5432 ESTABLISHED 22996/java tcp 0 0 ::ffff:127.0.0.1:44963 ::ffff:127.0.0.1:5432 ESTABLISHED 22996/java tcp 0 0 ::ffff:127.0.0.1:44964 ::ffff:127.0.0.1:5432 ESTABLISHED 22996/java tcp 0 0 ::ffff:127.0.0.1:44921 ::ffff:127.0.0.1:5432 ESTABLISHED 22996/java tcp 0 0 ::ffff:127.0.0.1:44962 ::ffff:127.0.0.1:5432 ESTABLISHED 22996/java Attaching the output of ambari-server.out.
... View more
05-09-2017
07:41 AM
Hi @Jay SenSharmaAs I mentioned there are no error generated in the ambari-server.log file. There are only WARN and INFO messages, please let me know if you need one.
... View more
05-09-2017
07:40 AM
Hi @PardeepAs I mentioned there are no error generated in the ambari-server.log file. There are only WARN and INFO messages, please let me know if you need one.
... View more
05-09-2017
07:37 AM
Hi @Palanivelrajan Chellakutty: Below are answers for your question, - Yes it is running on a VM. - Could you please explain what reloading on VM's. The hadoop cluster is a 2 node cluster with 1 gateway node and 1 worker node where I am trying to create testing environment. - Memory details, Below is the output of some CPU commands, --lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 4 NUMA node(s): 1 Vendor ID: XXX CPU family: 21 Model: 0 Stepping: 2 CPU MHz: 2400.028 BogoMIPS: 4800.05 Hypervisor vendor: YY Virtualization type: full
L1d cache: 16K L1i cache: 64K L2 cache: 2048K L3 cache: 12288K --free -m total used free shared buffers cached
Mem: 32113 10895 21217 0 379 9043 -/+ buffers/cache: 1472 30640 Swap: 5119 0 5119 --uname -mrs Linux 2.6.32-431.el6.x86_64 x86_64 Please let me know if you need more info.
... View more
05-08-2017
09:37 PM
1 Kudo
I am trying to start the Ambari server, the server status is running but not able to view it's UI in the 8080 port. No error logged in the file /var/log/ambari-server/ambari-server.log but there are errors logged in the file /var/log/ambari-agent/ambari-agent.log. Below are the two errors which are logged in /var/log/ambari-agent/ambari-agent.log, ERROR 2017-05-08 14:27:20,201 script_alert.py:112 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on http://xxx.YYY.com'] ERROR 2017-05-08 14:27:20,184 script_alert.py:112 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://xxx.YYY.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 165, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen\n return _opener.open(url, data, timeout)\n File "/usr/lib64/python2.6/urllib2.py", line 391, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.6/urllib2.py", line 409, in _open\n \'_open\', req)\n File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open\n raise URLError(err)\nURLError: <urlopen error [Errno 111] Connection refused>\n)'] Getting the below when I try to reach to the ambari-ui: ...Loading... I am working on this issue for the past 2 days. I am not able to find any related information / resolution in the internet. It will be great if anyone gives some inputs on this issue.
... View more
Labels:
04-06-2017
05:46 PM
you know where to logs ?
... View more