Member since
09-29-2015
140
Posts
87
Kudos Received
9
Solutions
03-03-2016
04:37 PM
4 Kudos
1. If using Ambari versions before 2.1.2, Disable HBase per region metrics.
On Ambari server host. Edit the following files under /var/lib/ambari-server/resources/: common-services/HBASE/0.96.0.2.0/package/templates/HBASE/hadoop-metrics2-hbase.properties-GANGLIA-MASTER.j2
common-services/HBASE/0.96.0.2.0/package/templates/HBASE/hadoop-metrics2-hbase.properties-GANGLIA-RS.j2 and add the following lines at the end before '% endif %}': *.source.filter.class=org.apache.hadoop.metrics2.filter.GlobFilter
hbase.*.source.filter.exclude=*Regions* Do a rolling restart of the HBase RegionServers.
Note: This does not disable RS metrics. It just disables the per region / per table metrics collected at Region level. This is disabled by default from Ambari 2.1.2.
2. Tune AMS configs:
Find out the Heap available to AMS collector host.
Change the following settings based on available memory: ams-hbase-env_ :: hbase_master_heapsize = 8192m (Or 16384m if available)
ams-hbase-env_ :: hbase_master_xmn_size = 1024m
ams-hbase-env_ :: regionserver_xmn_size = 1024m
ams-hbase-site :: phoenix.query.spoolThresholdBytes= 25165824 (24 MB from 12 MB)
3. AMS data storage
If using embedded mode, change the write paths for ams-hbase-site :: hbase.rootdir
ams-hbase-site :: hbase.tmp.dir so that its placed in the fastest possible disk. Also it's better to keep hbase.tmp.dir in a location different from hbase.rootdir After completing the above, stop AMS from Ambari. Once stopped ensure that the process are stopped by doing a ps -aux | grep ams If the process are still around, kill the same and clean up /var/run/ambari-metrics-collector/*.pid file. Now restart AMS services using Ambari.
... View more
Labels:
03-02-2016
01:52 PM
3 Kudos
With recent improvements in Ambari, upgrading can be done easily either using Rolling Upgrade or Express Upgrade. But there are times when upgrade/downgrade gets stuck either since all the precautions were not followed or due to product issues. When upgrade gets stuck, it is typically in an Upgrade Paused status or there aren't any status as such. At this juncture, care needs to be taken to ensure that the ambari-server is not restarted without consulting with Technical Support. Current status of the upgrade can be checked using multiple method:
Ambari Log files Ambari API URL's Ambari Databases 1. Ambari Log files Review Ambari Log files for any errors / exception when the upgrade is in progress. 2. Use Ambari URL's to find what are the failures http://<ambari-server>:8080/api/v1/clusters/c1/upgrades For eg http://vcert1.novalocal:8080/api/v1/clusters/VCertify/upgrades This would show all the upgrade or downgrade attempts. for eg if this is a upgrade failure, identify the latest Upgrade attempt number http://vcert1.novalocal:8080/api/v1/clusters/VCertify/upgrades/119?fields=upgrade_groups/upgrade_items/UpgradeItem/status,upgrade_groups/upgrade_items/UpgradeItem/context,upgrade_groups/UpgradeGroup/title this would list all the actions taken as part of upgrade attempt 119. Review the output to identify those JSON outputs without status 'COMPLETED'. This would give a clue to what items have failed to move to COMPLETED status and troubleshoot from there on. 3. Ambari Database Note: Care has to be taken while using Ambari DB. It is mandatory to backup the database before doing any upgrade / downgrade. Following tables in the Ambari database are ideal to start troubleshooting the issues: repo_version - this contains all the repo versions installed in the system mysql> select repo_version_id, stack_id, version, display_name from repo_version;
+-----------------+----------+--------------+------------------+
| repo_version_id | stack_id | version | display_name |
+-----------------+----------+--------------+------------------+
| 1 | 4 | 2.3.0.0-2557 | HDP-2.3.0.0-2557 |
| 2 | 4 | 2.3.2.0-2950 | HDP-2.3.2.0-2950 |
| 51 | 4 | 2.3.4.0-3485 | HDP-2.3.4.0 |
+-----------------+----------+--------------+------------------+
3 rows in set (0.00 sec) cluster_version - this contains the current versions in the cluster [installed / current / upgrading etc] mysql> select * from cluster_version;
+----+-----------------+------------+-------------+---------------+---------------+------------+
| id | repo_version_id | cluster_id | state | start_time | end_time | user_name |
+----+-----------------+------------+-------------+---------------+---------------+------------+
| 1 | 1 | 2 | OUT_OF_SYNC | 1448369111902 | 1448369112183 | _anonymous |
| 2 | 2 | 2 | UPGRADING | 1448521029573 | 1452063126443 | admin |
| 51 | 51 | 2 | CURRENT | 1450860003969 | 1451397592558 | admin |
+----+-----------------+------------+-------------+---------------+---------------+------------+
3 rows in set (0.00 sec)
host_version - this contains the details about the versions installed in a given host mysql> select * from host_version;
+----+-----------------+---------+-------------+
| id | repo_version_id | host_id | state |
+----+-----------------+---------+-------------+
| 1 | 1 | 4 | OUT_OF_SYNC |
| 2 | 1 | 1 | OUT_OF_SYNC |
| 3 | 1 | 2 | OUT_OF_SYNC |
| 4 | 1 | 3 | OUT_OF_SYNC |
| 5 | 2 | 1 | UPGRADED |
| 6 | 2 | 3 | UPGRADED |
| 7 | 2 | 2 | UPGRADED |
| 8 | 2 | 4 | OUT_OF_SYNC |
| 51 | 51 | 3 | CURRENT |
| 52 | 51 | 2 | CURRENT |
| 53 | 51 | 4 | CURRENT |
| 54 | 51 | 1 | CURRENT |
+----+-----------------+---------+-------------+
12 rows in set (0.05 sec)
hostcomponentstate- shows the current version / state of a given component or service mysql> select * from hostcomponentstate;
+-----+------------+------------------------+--------------+------------------+---------------+---------+----------------+---------------+----------------+
| id | cluster_id | component_name | version | current_stack_id | current_state | host_id | service_name | upgrade_state | security_state |
+-----+------------+------------------------+--------------+------------------+---------------+---------+----------------+---------------+----------------+
| 2 | 2 | NAMENODE | 2.3.4.0-3485 | 4 | STARTED | 4 | HDFS | NONE | UNSECURED |
| 3 | 2 | HISTORYSERVER | 2.3.4.0-3485 | 4 | STARTED | 1 | MAPREDUCE2 | NONE | UNSECURED |
| 4 | 2 | APP_TIMELINE_SERVER | 2.3.4.0-3485 | 4 | STARTED | 4 | YARN | NONE | UNSECURED |
| 5 | 2 | RESOURCEMANAGER | 2.3.4.0-3485 | 4 | STARTED | 4 | YARN | NONE | UNSECURED |
| 6 | 2 | WEBHCAT_SERVER | 2.3.4.0-3485 | 4 | INSTALLED | 2 | HIVE | NONE | UNSECURED |
| 8 | 2 | HIVE_SERVER | 2.3.4.0-3485 | 4 | INSTALLED | 2 | HIVE | NONE | UNSECURED |
Reviewing the above tables would give you an idea about the current state of upgrade / downgrade. Further troubleshooting would depend on the current state of Ambari upgrade/ downgrade, but the above should give fair clue to troubleshoot the issues.
... View more
Labels:
12-12-2017
12:08 PM
@Abraham Johnson @vpoornalingam There is still another reason and cure for this scenario (HDP-2.6.2.0-205). It can also happen that Ambari is looking for the pid files in the wrong place In my case the pid files were actually located at: /var/run/hadoop/hdfs-<clustername>/hadoop-hdfs-<clustername>-namenode.pid while ambari-agent would look at : /var/run/hadoop/hdfs/hadoop-hdfs-hdfs-namenode.pid In this state, with both the dir and the pid file name wrong, Ambari does not detect a running HDFS service, and you would also not be able to (re)start it. The pid file location is deduted from this snippet in hadoop-env.sh: export HADOOP_PID_DIR={{hadoop_pid_dir_prefix}}/$USER I have yet to find out why Ambari decided to change the value of $USER all of a sudden.
... View more