Member since
08-08-2017
1652
Posts
30
Kudos Received
11
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1964 | 06-15-2020 05:23 AM | |
| 16020 | 01-30-2020 08:04 PM | |
| 2108 | 07-07-2019 09:06 PM | |
| 8244 | 01-27-2018 10:17 PM | |
| 4664 | 12-31-2017 10:12 PM |
01-01-2018
08:29 PM
in our ambari cluster system ( version 26 ) , we have 3 master node while Spark2 Thrift Server installed on master01 and master03 when we start both or one of then spark thrift server , its start for ashort time - 30 sec and then fail back we can see from the ambari-agent log the following details about the spark thrift what could be the problem INFO 2018-01-01 22:14:51,827 RecoveryManager.py:255 - SPARK2_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2018-01-01 22:15:02,732 RecoveryManager.py:255 - SPARK2_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2018-01-01 22:15:06,054 StatusCommandsExecutor.py:65 - Adding STATUS_COMMAND for component SPARK2_THRIFTSERVER of service SPARK2 of cluster hdp to the queue.
INFO 2018-01-01 22:15:06,153 StatusCommandsExecutor.py:65 - Adding STATUS_COMMAND for component SPARK2_CLIENT of service SPARK2 of cluster hdp to the queue.
INFO 2018-01-01 22:15:06,501 RecoveryManager.py:255 - SPARK2_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2018-01-01 22:15:13,347 RecoveryManager.py:255 - SPARK2_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2018-01-01 22:15:23,356 RecoveryManager.py:255 - SPARK2_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED.
<br>
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
-
Apache Spark
01-01-2018
05:27 PM
we have in the ambari 3 workers machines ( data node machines ) each worker machine has the following components: DataNode ( HDFS ) Metrics Monitor NodeManager how to stop/start all these components only on worker machine! second is it possible to stop/start the components only on all workers machines ( instead to stop/start each individual work machine ) ? YARN
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
01-01-2018
05:11 PM
we want to understand what is the pre procedure before adding on each worker machine disks we have ambari cluster version 2.6 with 3 datanode machines ( workers machines ) we want to add on each worker new 5 disks before adding the new disks do we must to stop the components on each worker machine ( the componets are - data node , matrix monitor , node manager ) or maybe to restart all effected servoces/compontes in enntire cluster?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
-
Apache YARN
01-01-2018
01:48 PM
we want to capture the values of yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs by API in order to use these values in bash script example from ambari GUI what is the API syntax for yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs ?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
-
Apache YARN
12-31-2017
10:19 PM
we have ambari cluster ( version 2.6 ) with 3 workers machines ( datanode machines ) each worker machine have now 10 disks ( after we add 5 disks on each worker machine ) how to identify from ambari GUI that all disks are ok and ambary recognize the disks size ? df -h from worker machine
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
12-31-2017
10:12 PM
need to configure the file /var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist with all mount point , then start the data-node on each worker machine
... View more
12-31-2017
07:26 PM
we have ambari cluster ( version 2.6 ) with 3 workers machines ( datanode machines ) each worker machine was with 5 disks ( each disk is 20G ) we upgrade the disks to 10 for each worker machine as the following: 1. on each worker we create filesystem for the new 5 disks and perform mount to the partition 2. we update the new disks on dfs.datanode.data.dir , yarn.nodemanager.local-dirs , yarn.nodemanager.log-dirs ( ambari GUI ) 3. restart HDFS and YARN but when we restart the HDFS on the workers ( datanode machines ) we get the following errors: 2017-12-31 19:01:52,509 -
***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** WARNING *****
***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** WARNING *****
***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** WARNING *****
Directory /grid/sdg/hadoop/hdfs/data became unmounted from / . Current mount point: /grid/sdg .
Directory /grid/sdh/hadoop/hdfs/data became unmounted from / . Current mount point: /grid/sdh .
Directory /grid/sdi/hadoop/hdfs/data became unmounted from / . Current mount point: /grid/sdi .
Directory /grid/sdj/hadoop/hdfs/data became unmounted from / . Current mount point: /grid/sdj .
Directory /grid/sdk/hadoop/hdfs/data became unmounted from / . Current mount point: /grid/sdk . Please ensure that mounts are healthy. If the mount change was intentional, you can update the contents of /var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist.
***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** WARNING *****
***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** WARNING *****
***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** WARNING *****
so for now all the new disks are empty without hadoop folder under /grid/sdX please advice how to continue from this stage ? * we restart teh ambari agent and ambari server without help
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
12-21-2017
04:54 PM
@Joe P , we ask about ssh after ambari server with all nodes are already installed , ( its clearly that need ssh during installation ) , but I ask if ssh is needed after ambari cluster is already installed ?
... View more
12-21-2017
03:52 PM
hi all, we installed ambari cluster with 3 masters machines ( ambari server installed on master02 ) and 25 workers machines and 5 kafka's machines dose ambari-server needs ssh access to all other machines in the cluster? dose all files under /root/.ssh/.. on ambari-server machine are used by ambari cluster?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
12-20-2017
07:20 PM
we cant start the App Timeline Server we get the following logs when we start the - App Timeline Server how to fix this isshue ? Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/application_timeline_server.py", line 155, in <module>
ApplicationTimelineServer().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 314, in execute
method(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 762, in restart
self.start(env, upgrade_type=upgrade_type)
File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/application_timeline_server.py", line 44, in start
self.configure(env) # FOR SECURITY
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 117, in locking_configure
original_configure(obj, *args, **kw)
File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/application_timeline_server.py", line 55, in configure
yarn(name='apptimelineserver')
File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/yarn.py", line 337, in yarn
mode=0755
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 555, in action_create_on_execute
self.action_delayed("create")
File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 552, in action_delayed
self.get_hdfs_resource_executor().action_delayed(action_name, self)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 288, in action_delayed
self._set_mode(self.target_status)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 459, in _set_mode
self.util.run_command(self.main_resource.resource.target, 'SETPERMISSION', method='PUT', permission=self.mode, assertable_result=False)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 199, in run_command
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w '%{http_code}' -X PUT 'http://master01.sys65.com:50070/webhdfs/v1/ats/done?op=SETPERMISSION&user.name=hdfs&permission=755'' returned status_code=403.
{
"RemoteException": {
"exception": "RetriableException",
"javaClassName": "org.apache.hadoop.ipc.RetriableException",
"message": "org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot set permission for /ats/done. Name node is in safe mode.\nThe reported blocks 425 needs additional 6 blocks to reach the threshold 0.9900 of total blocks 435.\nThe number of live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached."
}
}
... View more
Labels: