Member since
09-29-2015
63
Posts
107
Kudos Received
13
Solutions
05-03-2017
10:01 PM
Problem: There's a known bug in Ambari 2.4 and 2.5 that causes "ambari-server upgrade" to fail if the agent RPM is not upgraded first. E.g. stack trace: Using python /usr/bin/python Setup ambari-server Traceback (most recent call last): File "/usr/sbin/ambari-server.py", line 33, in from ambari_server.dbConfiguration import DATABASE_NAMES, LINUX_DBMS_KEYS_LIST File "/usr/lib/python2.6/site-packages/ambari_server/dbConfiguration.py", line 28, in from ambari_server.serverConfiguration import decrypt_password_for_alias, get_ambari_properties, get_is_secure, \ File "/usr/lib/python2.6/site-packages/ambari_server/serverConfiguration.py", line 36, in from ambari_commons.os_utils import run_os_command, search_file, set_file_permissions, parse_log4j_file ImportError: cannot import name parse_log4j_file Cause: This occurs because os_utils.py and other python files inside of /usr/lib/ambari-agent/lib/ambari_commons are upgraded by the agent's RPM and are used by the server's scripts to find which database to use.
Solution:
Note: Always back up your Ambari database before the upgrade. If ambari-agent is also present on the Ambari Server host, run "yum upgrade ambari-agent" (or equivalent for your OS).
... View more
Labels:
01-31-2017
01:47 AM
10 Kudos
Whether you're creating an Ambari cluster from scratch, taking over an existing cluster, or growing your cluster over time, it is imperative to tune Ambari and MySQL to work at a large scale of 1000-3000 Ambari Agents. Ambari Server Configs
First, increase the memory used by Ambari. For large clusters, 8 GB of memory should be sufficient. If you have more than 10 concurrent users, increase it to 16 GB.
Edit /var/lib/ambari-server/ambari-env.sh and change the -Xmn setting.
export AMBARI_JVM_ARGS=$AMBARI_JVM_ARGS' -Xms2048m -Xmx8192m
Edit /etc/ambari-server/conf/ambari.properties with the following configs
# The size of the Jetty connection pool used for handling incoming Ambari Agent requests.
# 10 hosts => 25
# 50 hosts => 35
# 100 hosts => 75
# 500 hosts => 100
agent.threadpool.size.max=100
# Determines whether current alerts should be cached.
# Enabling this can increase performance on large cluster, but can also result in lost alert data
# if the cache is not flushed frequently.
alerts.cache.enabled=true
# The size of the alert cache.
# Less than 50 hosts => 50000
# More than 50 hosts => 100000
alerts.cache.size=100000
# The number of threads used to handle alerts received from the Ambari Agents.
# The value should be increased as the size of the cluster increases.
# Less than 50 hosts => 2
# More than 50 hosts => 4
alerts.execution.scheduler.maxThreads=4
After performing these changes, restart Ambari Server. Move an existing Ambari DB from a spinning disk to a SSD
It is highly suggested to use a Solid State Drive for the Ambari Database since this will be much faster.
Check the throughput of the disk in which Ambari’s database (Postgres, MySQL, MariaDB, or Oracle) is on.Ideally, it should be a Solid State Drive or support at least 200 IOPS and be either on the same host as Ambari or only a 1-2 hops away.
Type
Details
IOPS
Throughput
HDD
10,000 rpm SAS drive
175-210
100 MB/s
SSD
solid-state
500+
500+ MB/s
1. ambari-server stop
2. Take a backup of the Ambari database,
mysqldump -u root ambari > /tmp/ambari.sql
3. Stop MySQL server, copy its data, and change the directory.
service mysqld stop
cp -R -p /var/lib/mysql /mnt/disks/ssd/mysql
cat /etc/my.cnf
sed -ie 's/\/var\/lib\/mysql/\/mnt\/disks\/ssd\/mysql/g' /etc/my.cnf
4. Create symlink for sock file and start MySQL
ln -s /mnt/disks/ssd/mysql/mysql.sock /var/lib/mysql/mysql.sock
service mysqldstart
5. Ensure Ambari DB is accessible.
mysql -u root -p
show databases;
use ambari;
show tables;
select count(*) from hosts;
MySQL Optimizations
First and foremost, if you're on an older version of MySQL, you can try to update it to MySQL 5.6 or 5.7, which has a lot of performance improvements.
Connect to the MySQL DB and inspect these variables. E.g.,
SHOW VARIABLES LIKE 'name';These suggested values assume that only Ambari Database’s is on the MySQL Server.If you have other databases in the same MySQL Server, increment by these values.
WARNING: Never stop MySQL server while Ambari Server is running.
Variable
Suggested Value
innodb_log_buffer_size
512M
innodb_buffer_pool_size
16G
innodb_file_io_threads (deprecated in MySQL 5.5)
16
innodb_log_file_size
5M
innodb_thread_concurrency
32
join_buffer_size
512M
key_buffer_size
16G
max_connections
500
max_allowed_packet
1024M
max_heap_table_size
64M
query_cache_limit
16M
query_cache_size
512M
read_rnd_buffer_size
128M
sort_buffer_size
128M
table_open_cache
1024
thread_cache_size
128
thread_stack
256K
To change these values.1. Stop MySQL: service mysqld stop
2. Edit the configs in /etc/my.cnf , under the “[mysqld]” section (note, it may be in a different location).
3. Start MySQL: service mysqld start
... View more
Labels:
04-25-2016
11:40 PM
7 Kudos
When performing a
Rolling or Express Upgrade, failures can naturally happen because large clusters are bound to have problematic hosts.
Here are 10 easy tips to prevent, diagnose and fix errors. Before upgrading the stack ...
1. Always upgrade Ambari to the most recent version, even if it's a dot release.
Often, there are fixes and optimizations that make the stack upgrade smoother.
2. Ensure all services are up, service checks are passing, there are no critical alerts, etc.
This helps ensure that the cluster is fully operational and helps to isolate any failures.
3. Pre-Install the bits and make sure all hosts have enough disk space. You can check that the version is found on all hosts. E.g.,
hdp-select versions | grep 2.5.0.0 | sort | tail -1
4. Do not ignore warnings. Starting in Ambari 2.2.2, there's a flag in ambari.properties file that allows users to bypass PreCheck errors, make sure it is either not present or set to false,
stack.upgrade.bypass.prechecks=false
5. Take a backup of the Ambari database. E.g.,
pg_dump -U ambari ambari > /tmp/ambari_bk.psql
mysqldump -u ambari ambari > /tmp/ambari_bk.mysql
In the middle of Upgrade ...
6. Rolling Upgrade will pause after 30% of the DataNodes have been upgraded. This allows the customer to run additional jobs and ensure that the partial upgrade is still healthy.
7. If a failure occurs, click on "Retry" and make sure that all other dependent services and masters are up.
Often, a retry will work if the previous command failed due to a timeout, network glitch, host goes down and then comes back up, etc. Capture any logs from both the component that failed and the ambari-agent at /var/lib/ambari-agent/data/output-*.txt and /var/lib/ambari-agent/data/errors-*.txt
8. If the failure requires changing configs or restarting a component on a host, then click on the "Pause" button. This will temporarily suspend the Upgrade/Downgrade and allow the user to change configs, execute other commands, such as restarting services, running service checks, etc. Once done, click on the "Resume" button.
CAUTION: do not ever add or move hosts, add or delete services, enable High Availability, or change topology while the upgrade is in progress.
If cannot Finalize ...
9. Find out the problematic hosts and components. In Ambari 2.0 - 2.2, you can run
SELECT repo_version_id, version, display_name FROM repo_version;
-- The state for your version may be in UPGRADING, UPGRADED.-- UPGRADING: some component on a host is still not on the newer version
-- UPGRADED: all components on all hosts are on the newer version
SELECT version, state FROM cluster_version cv JOIN repo_version rv ON cv.repo_version_id = rv.repo_version_id ORDER BY version DESC;
-- Find how many hosts are in each state
SELECT version, state, COUNT(*) FROM host_version hv JOIN repo_version rv ON hv.repo_version_id = rv.repo_version_id GROUP BY version, state ORDER BY version DESC, state;
-- Find components on hosts still not on the newer version
SELECT service_name, component_name, version, host_name FROM hostcomponentstate hcs JOIN hosts h ON hcs.host_id = h.host_id WHERE service_name NOT IN ('AMBARI_METRICS', 'KERBEROS') and component_name NOT IN ('ZKFC') ORDER BY version, service_name, component_name, host_name;
On these hosts, run the following,
1. hdp-select set all <new_version>
2. Restart any components still on the older version (you may have to click on the "Pause" button first).
Once all hosts are on the newer version, then the Cluster Version status should transition to UPGRADED; this will allow you to Finalize the upgrade.
10. If you still run into problems, gather all of the logs, result of the SQL queries, and either email Hortonworks Support or the mailing list of the component it failed on.
Here's another useful query.
Postgres:
SELECT u.upgrade_id, u.direction, u.from_version, u.to_version, hrc.request_id, hrc.task_id, substr(g.group_title, 0, 30), substr(i.item_text, 0, 30), hrc.status
FROM upgrade_group g JOIN upgrade u ON g.upgrade_id = u.upgrade_id
JOIN upgrade_item i ON i.upgrade_group_id = g.upgrade_group_id
JOIN host_role_command hrc ON hrc.stage_id = i.stage_id AND hrc.request_id = u.request_id
ORDER BY hrc.task_id;
MySQL:
SELECT u.upgrade_id, u.direction, u.from_version, u.to_version, hrc.request_id, hrc.task_id, left(g.group_title, 30), left(i.item_text, 30), hrc.status
FROM upgrade_group g JOIN upgrade u ON g.upgrade_id = u.upgrade_id
JOIN upgrade_item i ON i.upgrade_group_id = g.upgrade_group_id
JOIN host_role_command hrc ON hrc.stage_id = i.stage_id AND hrc.request_id = u.request_id
ORDER BY hrc.task_id;
Have fun upgrading.
... View more
Labels:
03-31-2016
01:32 AM
I believe that in either Ambari 2.1.2 or 2.2.0 we introduced a button to "Pause" the upgrade. This essentially aborts all of the pending tasks from the upgrade so that you can perform operations on your own, such as restarting services. In this "paused" state, we only recommended to start/stop/restart services and change configs; anything that involves changing the topology (adding hosts, adding services, HA, etc.) is likely to result in problems. Because the RU/EU is still technically active (although in an ABORTED state), there will be a button to "Resume" it.
... View more
12-17-2015
02:41 AM
32 Kudos
One of the most gargantuan tasks of any cluster administrator is upgrading the bits since it is a tedious, risky, and inherently complex process that can take days. Ambari comes to the rescue with two features: Rolling Upgrade (RU), and Express Upgrade (EU), aimed upgrading the HDP cluster with a couple of clicks. For starters, Rolling Upgrade was first released in Ambari 2.0.0 (March 2015) and the latest incarnation (as of October 2015) in Ambari 2.1.2 provides added robustness and a couple of goodies. Express Upgrade is set to be released in Ambari 2.2.0 (ETA is Dec 16, 2015). So how does it work? What are the gotcha's? What do I need to know?
Overview: Both upgrade mechanisms update the bits and configurations of your cluster. The main difference is that RU upgrades one node at a time, and maintains the cluster and all Hadoop jobs running, while EU stops all services, changes the version and configs, then starts all services in parallel. Therefore, RU is the prime candidate for environments that cannot take downtime, whereas EU is faster since it takes downtime. If your cluster is large, on the order of 500+ nodes and you must finish upgrade in a weekend, then EU is the clear choice.
Pre-Reqs: Bits:
In both cases, the user must first register a new repo, and install the bits side-by-side. For example, /usr/hdp/2.2.4.2-2 (current version)
/usr/hdp/2.3.0.0-2557 (version that will upgrade to) The good news is that this can be done ahead of time. Ambari will only install the bits for the services needed on that host, this ensures that we save on disk-space since the full HDP stack can take up to 2.5 GB. Pre-Checks:
It is wise for users to make sure that they pass the Pre-Checks before attempting to start the upgrade. The pre-checks include: All hosts have the repo version installed All components are installed All hosts are heartbeating All hosts in maintenance state do not have any master components No services are in maintenance mode All Services are up Hive is configured for dynamic discovery For RU, client retry enabled for HDFS, Hive, and Oozie For RU, YARN has work-preserving restart enabled For RU, MapReduce2 History server has state preserving mode enabled For RU, MapReduce jobs reference hadoop libraries from the distributed cache For RU, Tez jobs reference hadoop libraries from the distributed cache For RU, Hive has at least 2 MetaStores For RU, NameNode High Availability is required, and must use a dfs nameservice If the user adds any services or hosts after already installing the bits for a repo, then they must redistribute the bits since Ambari will mark that repo as "out_of_sync".
Orchestration: Rolling Upgrade orchestrates the services one at a time, and restarts one component at a time. When a component is restarted, it is stopped on the old version, then started on the newer version. For HDP, this is done by calling hdp-select set $comp $version to set the symlink of the binary, and if on HDP 2.3 or higher, then also calling conf-select set-conf-dir $package $stack_version $conf_version to change the symlink for the configuration. The binary symlinks are controlled by /usr/hdp/current/$comp-name/ -> /usr/hdp/$version/$comp
The confs are controlled by two symlinks,
/etc/$comp/conf -> /usr/hdp/$version/$comp/conf -> /etc/$comp/$version/0 RU restarts the services from the bottom-up, i.e.,
ZK Ranger Core Masters: HDFS, MR2, YARN, HBASE Core Slaves: DataNode, RegionServer, NodeManager on each host Auxiliary & Clients: Hive, Spark, Oozie, Falcon, Clients, Kafka, Knox, Storm, Slider, Flume, Accumulo Before starting, Ambari will prompt the user to take backups of the database, and will automatically snapshot the HDFS namespace. Throughout the process, Ambari will orchestrate Service Checks after critical points. At the end, Ambari will finalize the rolling upgrade, and save the state in its database. Express Upgrade has a slightly different orchestration; it stops all services on the current stack from the top-down, then it changes to the new stack and applies configs, then it starts services from the bottom-up. When both stopping and starting services, it will do so in parallel.
Furthermore, because EU takes downtime, it does not require NameNode High Availability.
Merging Configurations: When upgrading across major versions, e.g. HDP 2.2->2.3, Ambari will have to merge configs. E.g., HDP 2.2 default configs = base
HDP 2.3 default configs = desired Ambari has rules for how to add, rename, transform, and delete configs from the base stack to the desired stack. Any properties that the user modified in the base stack will be persisted, even if the new stack has a different value. Error Handling: All operations are retry-able in the case of an error and we've made a considerable effort to ensure all ops are idempotent. Moreover, non-critical steps (such as Service Checks and the higher-level services) can always be skipped, since they can always be fixed right before finalizing. In Ambari 2.1.2, Ambari introduced two values that allow the Upgrade Packs to automatically skip Service Check or Component Failures. See AMBARI-13032. In Ambari 2.1.3, these error-handling options can be controlled at run-time, and it makes it easier to ignore errors until the end. See AMBARI-13018.
Further, Ambari 2.1.3 also allows suppressing manual tasks so they run silently. See AMBARI-13457. Gotcha's, Tips, and Tricks: 1. Checkout this presentation: RU Tips, Tricks, Hacks 2. Always move to the latest version of Ambari. 3. Backup the Ambari database before starting UpgradeIf you run into any problems attempting to "Save the Cluster State", this is likely because some hosts/components are still on the older version. To find out these components, run SELECT h.host_name, hcs.service_name, hcs.component_name, hcs.version FROM hostcomponentstate hcs JOIN hosts h ON hcs.host_id = h.host_id ORDER BY hcs.version, hcs.service_name, hcs.component_name, h.host_name; To fix the components, on each host, run this for the applicable components, hdp-select versions
hdp-select set $comp_name $desired_version and restart the components (note: you may have to do this manually, or by enabling the /#/experimental flag) 4. If you run into any problems during Upgrade, try RU Magician! It's a python script that checks the database and can perform some updates for you, RU Magician 5. For advanced users, you can still modify properties by navigating to http://$server:8080/#/experimental and enabling "opsDuringRollingUpgrade" 6. If planning to upgrade Knox to HDP 2.3.2 or higher, must first upgrade Ambari to 2.1.2 7. If patching tez.lib.uris, then must reset the path to the original value before starting the Upgrade; otherwise, Ambari will persist the value of the patched jar, which will not work in the new version. 8. If performing a manual stack upgrade, don't forget to call this to save the new version as "current" ambari-server set-current --cluster-name=$CLUSTERNAME --version-display-name=$VERSION_NAME In Ambari 2.2, you can now force the finalization, thereby skipping any errors.
al stack upgrade, don't In Ambari 2.1.3, you can force the finalization by running the command above with "--force". See AMBARI-13591 Example of APIs: Run the pre-checks: curl -u $admin:$password -X POST -H 'X-Requested-By:admin' <a href="http://$server:8080/api/v1/clusters/$name/rolling...">http://$server:8080/api/v1/clusters/$name/rolling...</a> Start the upgrade: curl -u $admin:$password -X POST -H 'X-Requested-By:admin' <a href="http://$server:8080/api/v1/clusters/$name/upgrade...">http://$server:8080/api/v1/clusters/$name/upgrade...</a> -d '{"Upgrade":{"repository_version":"2.3.0.0-2557", "type":"ROLLING"}}' Check the status: curl -u $admin:$password -X GET -H 'X-Requested-By:admin' <a href="http://$server:8080/api/v1/clusters/c1/upgrades">http://$server:8080/api/v1/clusters/c1/upgrades</a> Debugging & Logging: If the the upgrade fails to Finalize, find out which hosts and components are still not on the newer version. -- Check the repo version state
SELECT rv.version, cv.state FROM repo_version rv
JOIN cluster_version cv ON rv.repo_version_id = cv.repo_version_id
ORDER BY rv.version ASC;
-- Check the hosts
SELECT rv.version, h.host_name, hv.state
FROM repo_version rv
JOIN host_version hv ON rv.repo_version_id = hv.repo_version_id
JOIN hosts h ON hv.host_id = h.host_id
ORDER BY rv.version ASC, h.host_name;
-- Find the components on the wrong version,
-- call "hdp-select set <comp> <version>", check the config symlinks, and restart them manually
SELECT hcs.service_name, hcs.component_name, h.host_name, hcs.version
FROM hostcomponentstate hcs
JOIN hosts h ON hcs.host_id = h.host_id
ORDER BY hcs.version ASC, hcs.service_name, hcs.component_name, h.host_name;
Postgres: SELECT u.upgrade_id, u.direction, substr(g.group_title, 0, 40), substr(i.item_text, 0, 80), substr(hrc.status, 0, 40), hrc.task_id, h.host_name, hrc.output_log, hrc.error_log
FROM upgrade_group g JOIN upgrade u ON g.upgrade_id = u.upgrade_id
JOIN upgrade_item i ON i.upgrade_group_id = g.upgrade_group_id
JOIN host_role_command hrc ON hrc.stage_id = i.stage_id AND hrc.request_id = u.request_id
JOIN hosts h ON hrc.host_id = h.host_id
ORDER BY u.upgrade_id, g.upgrade_group_id, i.stage_id;
MySQL: SELECT u.upgrade_id, u.direction, LEFT(g.group_title, 40), LEFT(i.item_text, 80), LEFT(hrc.status, 40), hrc.task_id, h.host_name, hrc.output_log, hrc.error_logFROM upgrade_group AS g JOIN upgrade AS u ON g.upgrade_id = u.upgrade_idJOIN upgrade_item AS i ON i.upgrade_group_id = g.upgrade_group_idJOIN host_role_command AS hrc ON hrc.stage_id = i.stage_id AND hrc.request_id = u.request_idJOIN hosts AS h ON hrc.host_id = h.host_id ORDER BY u.upgrade_id, g.upgrade_group_id, i.stage_id;
If you have any questions, feel free to email user@ambari.apache.org
... View more
Labels: