Created 02-19-2018 06:27 PM
I have set up ambari cluster with 9 nodes. Everything was working fine perfectly. For some reason I have to change data directory I have given for hdfs to another disc. I have stopped all the services and mounted new hard disc properly to the same old mount point. I have rebooted all the nodes after mounting new device. Now when I start all services again. All the services getting failed. What could be the reason ? Do I need to do any step here ?
Created 02-21-2018 05:44 AM
Created 02-19-2018 08:57 PM
After the change all the services startup are failing. So can you please share some of the service log so that we can see what they are complaining about.
Logs must give us some useful hint when services fails.
Created 02-20-2018 05:00 AM
When I stop all and start all services I get the below error in zepplin notebook start service.
Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/ZEPPELIN/0.6.0/package/scripts/master.py", line 522, in <module> Master().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 367, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/ZEPPELIN/0.6.0/package/scripts/master.py", line 254, in start self.create_zeppelin_dir(params) File "/var/lib/ambari-agent/cache/common-services/ZEPPELIN/0.6.0/package/scripts/master.py", line 89, in create_zeppelin_dir replace_existing_files=True, File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 604, in action_create_on_execute self.action_delayed("create") File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 601, in action_delayed self.get_hdfs_resource_executor().action_delayed(action_name, self) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 336, in action_delayed self._create_resource() File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 352, in _create_resource self._create_file(self.main_resource.resource.target, source=self.main_resource.resource.source, mode=self.mode) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 467, in _create_file self.util.run_command(target, 'CREATE', method='PUT', overwrite=True, assertable_result=False, file_to_put=source, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 177, in run_command return self._run_command(*args, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 235, in _run_command _, out, err = get_user_call_output(cmd, user=self.run_user, logoutput=self.logoutput, quiet=False) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py", line 61, in get_user_call_output raise ExecutionFailed(err_msg, code, files_output[0], files_output[1]) resource_management.core.exceptions.ExecutionFailed: Execution of 'curl -sS -L -w '%{http_code}' -X PUT --data-binary @/usr/hdp/current/zeppelin-server/interpreter/spark/dep/zeppelin-spark-dependencies-0.6.0.2.5.3.0-37.jar -H 'Content-Type: application/octet-stream' 'http://ip-172-31-31-102.ec2.internal:50070/webhdfs/v1/apps/zeppelin/zeppelin-spark-dependencies-0.6.0.2.5.3.0-37.jar?op=CREATE&user.name=hdfs&overwrite=True&permission=444' 1>/tmp/tmp0f3h5s 2>/tmp/tmpLIZ7_n' returned 55. curl: (55) Send failure: Connection reset by peer 201
Created 02-20-2018 05:19 AM
It looks like the following API call is failing with "Connection reset by peer"
# curl -sS -L -w '%{http_code}' -X PUT --data-binary @/usr/hdp/current/zeppelin-server/interpreter/spark/dep/zeppelin-spark-dependencies-0.6.0.2.5.3.0-37.jar -H 'Content-Type: application/octet-stream' 'http://ip-172-31-31-102.ec2.internal:50070/webhdfs/v1/apps/zeppelin/zeppelin-spark-dependencies-0.6.0.2.5.3.0-37.jar?op=CREATE&user.name=hdfs&overwrite=True&permission=444
.
This generally happens when the communication is broken (or timed out by some firewall rule) while uploading the JAR to HDFS using webhdfs API "http://ip-172-31-31-102.ec2.internal:50070/webhdfs/v1/apps/zeppelin/zeppelin-spark-dependencies-0.6.0.2.5.3.0-37.jar".
Test-1) So can you please check how much time does this API call takes from your few cluster nodes? And also to see if you are able to put this JAR to HDFS using the same API call from any of your cluster node?
Test-2). While you are trying to upload this JAR, Please check your NameNode log to see if there is any error/warning?
.
Created 02-20-2018 11:04 AM
In addition to that my hbase service is not able to start due to the below error. The EBS volume I have changed is the data node directory I configured. And I had data in hbase before doing that. In order to avoid that do I need to anything else ?
8-02-20 11:56:44,465 WARN [Thread-70] hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/hbase/data/.tmp/hbase.version could only be replicated to 0 nodes instea d of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1649) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3198) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3122) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:843) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB .java:500)
Created 02-21-2018 05:44 AM