Support Questions

Find answers, ask questions, and share your expertise

Ambari restart all services after mounting new file system for data nodes

avatar
Contributor

I have set up ambari cluster with 9 nodes. Everything was working fine perfectly. For some reason I have to change data directory I have given for hdfs to another disc. I have stopped all the services and mounted new hard disc properly to the same old mount point. I have rebooted all the nodes after mounting new device. Now when I start all services again. All the services getting failed. What could be the reason ? Do I need to do any step here ?

1 ACCEPTED SOLUTION

avatar
Contributor
  1. I was able to resolve my issue. Thanks a lot @Jay Kumar SenSharma. I have added the solution here

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@Irshad Muhammed

After the change all the services startup are failing. So can you please share some of the service log so that we can see what they are complaining about.

Logs must give us some useful hint when services fails.

avatar
Contributor

@Jay Kumar SenSharma

When I stop all and start all services I get the below error in zepplin notebook start service.

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/ZEPPELIN/0.6.0/package/scripts/master.py", line 522, in <module>
    Master().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 367, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/ZEPPELIN/0.6.0/package/scripts/master.py", line 254, in start
    self.create_zeppelin_dir(params)
  File "/var/lib/ambari-agent/cache/common-services/ZEPPELIN/0.6.0/package/scripts/master.py", line 89, in create_zeppelin_dir
    replace_existing_files=True,
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 604, in action_create_on_execute
    self.action_delayed("create")
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 601, in action_delayed
    self.get_hdfs_resource_executor().action_delayed(action_name, self)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 336, in action_delayed
    self._create_resource()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 352, in _create_resource
    self._create_file(self.main_resource.resource.target, source=self.main_resource.resource.source, mode=self.mode)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 467, in _create_file
    self.util.run_command(target, 'CREATE', method='PUT', overwrite=True, assertable_result=False, file_to_put=source, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 177, in run_command
    return self._run_command(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 235, in _run_command
    _, out, err = get_user_call_output(cmd, user=self.run_user, logoutput=self.logoutput, quiet=False)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py", line 61, in get_user_call_output
    raise ExecutionFailed(err_msg, code, files_output[0], files_output[1])
resource_management.core.exceptions.ExecutionFailed: Execution of 'curl -sS -L -w '%{http_code}' -X PUT --data-binary @/usr/hdp/current/zeppelin-server/interpreter/spark/dep/zeppelin-spark-dependencies-0.6.0.2.5.3.0-37.jar -H 'Content-Type: application/octet-stream' 'http://ip-172-31-31-102.ec2.internal:50070/webhdfs/v1/apps/zeppelin/zeppelin-spark-dependencies-0.6.0.2.5.3.0-37.jar?op=CREATE&user.name=hdfs&overwrite=True&permission=444' 1>/tmp/tmp0f3h5s 2>/tmp/tmpLIZ7_n' returned 55. curl: (55) Send failure: Connection reset by peer
201

avatar
Master Mentor

@Irshad Muhammed

It looks like the following API call is failing with "Connection reset by peer"

# curl -sS -L -w '%{http_code}' -X PUT --data-binary @/usr/hdp/current/zeppelin-server/interpreter/spark/dep/zeppelin-spark-dependencies-0.6.0.2.5.3.0-37.jar -H 'Content-Type: application/octet-stream' 'http://ip-172-31-31-102.ec2.internal:50070/webhdfs/v1/apps/zeppelin/zeppelin-spark-dependencies-0.6.0.2.5.3.0-37.jar?op=CREATE&user.name=hdfs&overwrite=True&permission=444

.

This generally happens when the communication is broken (or timed out by some firewall rule) while uploading the JAR to HDFS using webhdfs API "http://ip-172-31-31-102.ec2.internal:50070/webhdfs/v1/apps/zeppelin/zeppelin-spark-dependencies-0.6.0.2.5.3.0-37.jar".

Test-1) So can you please check how much time does this API call takes from your few cluster nodes? And also to see if you are able to put this JAR to HDFS using the same API call from any of your cluster node?

Test-2). While you are trying to upload this JAR, Please check your NameNode log to see if there is any error/warning?

.

avatar
Contributor

@Jay Kumar SenSharma

In addition to that my hbase service is not able to start due to the below error. The EBS volume I have changed is the data node directory I configured. And I had data in hbase before doing that. In order to avoid that do I need to anything else ?

8-02-20 11:56:44,465 WARN  [Thread-70] hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/hbase/data/.tmp/hbase.version could only be replicated to 0 nodes instea
d of minReplication (=1).  There are 4 datanode(s) running and no node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1649)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3198)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3122)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:843)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB
.java:500)



avatar
Contributor
  1. I was able to resolve my issue. Thanks a lot @Jay Kumar SenSharma. I have added the solution here