About mike_bronson7

mike_bronson7 · ‎01-01-2018

in our ambari cluster system ( version 26 ) , we have 3 master node while Spark2 Thrift Server installed on master01 and master03 when we start both or one of then spark thrift server , its start for ashort time - 30 sec and then fail back we can see from the ambari-agent log the following details about the spark thrift what could be the problem INFO 2018-01-01 22:14:51,827 RecoveryManager.py:255 - SPARK2_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED. INFO 2018-01-01 22:15:02,732 RecoveryManager.py:255 - SPARK2_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED. INFO 2018-01-01 22:15:06,054 StatusCommandsExecutor.py:65 - Adding STATUS_COMMAND for component SPARK2_THRIFTSERVER of service SPARK2 of cluster hdp to the queue. INFO 2018-01-01 22:15:06,153 StatusCommandsExecutor.py:65 - Adding STATUS_COMMAND for component SPARK2_CLIENT of service SPARK2 of cluster hdp to the queue. INFO 2018-01-01 22:15:06,501 RecoveryManager.py:255 - SPARK2_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED. INFO 2018-01-01 22:15:13,347 RecoveryManager.py:255 - SPARK2_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED. INFO 2018-01-01 22:15:23,356 RecoveryManager.py:255 - SPARK2_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED. <br>

mike_bronson7 · ‎01-01-2018

we have in the ambari 3 workers machines ( data node machines ) each worker machine has the following components: DataNode ( HDFS ) Metrics Monitor NodeManager how to stop/start all these components only on worker machine! second is it possible to stop/start the components only on all workers machines ( instead to stop/start each individual work machine ) ? YARN

mike_bronson7 · ‎01-01-2018

we want to understand what is the pre procedure before adding on each worker machine disks we have ambari cluster version 2.6 with 3 datanode machines ( workers machines ) we want to add on each worker new 5 disks before adding the new disks do we must to stop the components on each worker machine ( the componets are - data node , matrix monitor , node manager ) or maybe to restart all effected servoces/compontes in enntire cluster?

mike_bronson7 · ‎01-01-2018

we want to capture the values of yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs by API in order to use these values in bash script example from ambari GUI what is the API syntax for yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs ?

mike_bronson7 · ‎12-31-2017

we have ambari cluster ( version 2.6 ) with 3 workers machines ( datanode machines ) each worker machine have now 10 disks ( after we add 5 disks on each worker machine ) how to identify from ambari GUI that all disks are ok and ambary recognize the disks size ? df -h from worker machine

mike_bronson7 · ‎12-31-2017

need to configure the file /var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist with all mount point , then start the data-node on each worker machine

mike_bronson7 · ‎12-31-2017

we have ambari cluster ( version 2.6 ) with 3 workers machines ( datanode machines ) each worker machine was with 5 disks ( each disk is 20G ) we upgrade the disks to 10 for each worker machine as the following: 1. on each worker we create filesystem for the new 5 disks and perform mount to the partition 2. we update the new disks on dfs.datanode.data.dir , yarn.nodemanager.local-dirs , yarn.nodemanager.log-dirs ( ambari GUI ) 3. restart HDFS and YARN but when we restart the HDFS on the workers ( datanode machines ) we get the following errors: 2017-12-31 19:01:52,509 - ***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** ***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** ***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** Directory /grid/sdg/hadoop/hdfs/data became unmounted from / . Current mount point: /grid/sdg . Directory /grid/sdh/hadoop/hdfs/data became unmounted from / . Current mount point: /grid/sdh . Directory /grid/sdi/hadoop/hdfs/data became unmounted from / . Current mount point: /grid/sdi . Directory /grid/sdj/hadoop/hdfs/data became unmounted from / . Current mount point: /grid/sdj . Directory /grid/sdk/hadoop/hdfs/data became unmounted from / . Current mount point: /grid/sdk . Please ensure that mounts are healthy. If the mount change was intentional, you can update the contents of /var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist. ***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** ***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** ***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** WARNING ***** so for now all the new disks are empty without hadoop folder under /grid/sdX please advice how to continue from this stage ? * we restart teh ambari agent and ambari server without help

mike_bronson7 · ‎12-21-2017

@Joe P , we ask about ssh after ambari server with all nodes are already installed , ( its clearly that need ssh during installation ) , but I ask if ssh is needed after ambari cluster is already installed ?

mike_bronson7 · ‎12-21-2017

hi all, we installed ambari cluster with 3 masters machines ( ambari server installed on master02 ) and 25 workers machines and 5 kafka's machines dose ambari-server needs ssh access to all other machines in the cluster? dose all files under /root/.ssh/.. on ambari-server machine are used by ambari cluster?

mike_bronson7 · ‎12-20-2017

we cant start the App Timeline Server we get the following logs when we start the - App Timeline Server how to fix this isshue ? Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/application_timeline_server.py", line 155, in <module> ApplicationTimelineServer().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 314, in execute method(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 762, in restart self.start(env, upgrade_type=upgrade_type) File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/application_timeline_server.py", line 44, in start self.configure(env) # FOR SECURITY File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 117, in locking_configure original_configure(obj, *args, **kw) File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/application_timeline_server.py", line 55, in configure yarn(name='apptimelineserver') File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/yarn.py", line 337, in yarn mode=0755 File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 555, in action_create_on_execute self.action_delayed("create") File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 552, in action_delayed self.get_hdfs_resource_executor().action_delayed(action_name, self) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 288, in action_delayed self._set_mode(self.target_status) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 459, in _set_mode self.util.run_command(self.main_resource.resource.target, 'SETPERMISSION', method='PUT', permission=self.mode, assertable_result=False) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 199, in run_command raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w '%{http_code}' -X PUT 'http://master01.sys65.com:50070/webhdfs/v1/ats/done?op=SETPERMISSION&user.name=hdfs&permission=755'' returned status_code=403. { "RemoteException": { "exception": "RetriableException", "javaClassName": "org.apache.hadoop.ipc.RetriableException", "message": "org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot set permission for /ats/done. Name node is in safe mode.\nThe reported blocks 425 needs additional 6 blocks to reach the threshold 0.9900 of total blocks 435.\nThe number of live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached." } }

Online	Offline
Last Visited	‎08-27-2024 09:17 AM

Member Since	‎08-08-2017 09:40 AM
Last Visited	‎08-27-2024 09:17 AM
Posts	1,652
Kudos received	29

Cloudera Community

Re: how to find number of CPU core on datanode ma...

Re: postgresql + ambari server failed to open port...

Re: how to stop the thrift servers by REST API

Re: namenode is in safe mode

Re: Directory /grid/sdg/hadoop/hdfs/data became un...

Spark2 Thrift Server not start on ambari cluster

how to stop all components on data-node machine ( ...

is it necessary to stop the components on each da...

how to print the values of yarn.nodemanager.local-...

what is the status from ambari GUI that approve al...

Re: Directory /grid/sdg/hadoop/hdfs/data became un...

Directory /grid/sdg/hadoop/hdfs/data became unmoun...

Re: dose ambari cluster needs ssh access between a...

dose ambari cluster needs ssh access between ambar...

App Timeline Server not start