Created 12-19-2016 03:33 AM
I have latest HDP 2.5 installed and controlled by Ambari. The Zepellin Notebook service never starts. Logs revealed that exception is thrown, which indicates "This address is already in use".
Some search on Internet revealed that problem may be caused by the fact that another program is using Zepellin's port 9995. However netstat command does not show anything using this port. Also I tried to change port to 9996 and it did not help either. It looks like the exception's message is misleading and problem is somewhere else.
Any help on this would be appreciated.
Created 12-21-2016 04:09 PM
It turned out that problem was quite different. Actually Zeppelin service was working fine, Ambari simply did not report it properly. "Address is already in use" message was shown simply because I tried to start Zeppelin service again when another instance of it was already running (but nobody knew about it).
So this particular problem (Address is already in use) is now solved. However new problem started.
Created 12-19-2016 04:27 AM
@Dmitry Otblesk Can you please share the logs ?
Created 12-20-2016 04:52 PM
I no longer have access to previous log as I deleted Zeppelin service and tried to reinstall it again. During the installation it thrown an error. Below is the content of stderr output
Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/ZEPPELIN/0.6.0.2.5/package/scripts/master.py", line 330, in <module> Master().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/ZEPPELIN/0.6.0.2.5/package/scripts/master.py", line 174, in start self.create_zeppelin_dir(params) File "/var/lib/ambari-agent/cache/common-services/ZEPPELIN/0.6.0.2.5/package/scripts/master.py", line 70, in create_zeppelin_dir recursive_chmod=True File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 459, in action_create_on_execute self.action_delayed("create") File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 456, in action_delayed self.get_hdfs_resource_executor().action_delayed(action_name, self) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 247, in action_delayed self._assert_valid() File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 231, in _assert_valid self.target_status = self._get_file_status(target) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 292, in _get_file_status list_status = self.util.run_command(target, 'GETFILESTATUS', method='GET', ignore_status_codes=['404'], assertable_result=False) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 179, in run_command _, out, err = get_user_call_output(cmd, user=self.run_user, logoutput=self.logoutput, quiet=False) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py", line 61, in get_user_call_output raise ExecutionFailed(err_msg, code, files_output[0], files_output[1]) resource_management.core.exceptions.ExecutionFailed: Execution of 'curl -sS -L -w '%{http_code}' -X GET 'http://lenu.dom.hdp:50070/webhdfs/v1/user/zeppelin?op=GETFILESTATUS&user.name=hdfs' 1>/tmp/tmpeaTzEm 2>/tmp/tmpwrNLnT' returned 7. curl: (7) Failed connect to lenu.dom.hdp:50070; Connection refused 000
Created 12-19-2016 10:46 AM
What exactly did you check with netstat?
If you've only checked what ports are used by listening services (netstat -l) I suggest to check all ports. I've seen cases when Hadoop services tried to listen on ports used as the source port by other TCP connections:
netstat -anp|grep 9995
Created 12-20-2016 04:56 PM
I tried the command you've suggested . It returned only 1 row
unix 3 [ ] STREAM CONNECTED 49995 5239/tracker-store
So, there is nothing on port 9995
Created 12-20-2016 08:58 PM
Hmm, okay, but I expect this to be transient as we're talking about TCP outgoing connections. The connection that could have potentially caused the issue it's not there any longer.
So it's more important when you start Zeppelin and get the "This address is already in use" also to check the netstat at around the same time.
The other error you get when you tried to install Zeppelin again is not from Zeppelin but from Ambari trying to create the zeppelin user home folder in HDFS. It looks like HDFS (WebHDFS in this case) is not working so check that please (lenu.dom.hdp on port 50070).
Created 12-21-2016 04:09 PM
It turned out that problem was quite different. Actually Zeppelin service was working fine, Ambari simply did not report it properly. "Address is already in use" message was shown simply because I tried to start Zeppelin service again when another instance of it was already running (but nobody knew about it).
So this particular problem (Address is already in use) is now solved. However new problem started.