Member since
10-06-2015
273
Posts
202
Kudos Received
81
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4036 | 10-11-2017 09:33 PM | |
3562 | 10-11-2017 07:46 PM | |
2569 | 08-04-2017 01:37 PM | |
2207 | 08-03-2017 03:36 PM | |
2235 | 08-03-2017 12:52 PM |
10-08-2016
04:17 PM
1 Kudo
Plan and Assess
This is a purely planning step. The expected deliverable is an
Upgrade Plan.
Gather all details about existing environment to plan for the upgrade path and
associated upgrade tasks.
1) Determine Upgrade Path
Based on the current and target version of the HDP stack, and
whether Ambari is used or not, select the supported upgrade guide from
Hortonworks documentation site. Identify key requirement if Namenode HA or
other HA needs to be disabled or Security needs to be disabled.
Current version:
● HDP Stack version
● Ambari version (if Ambari is used)
● OS Version
Target version:
● HDP Stack version
● Ambari version (if Ambari is used)
Below are some useful links
HDP Stacks Managed by Different Ambari Versions:
https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.0.0/bk_ambari-installation/content/determine_stack_compatibility.html
Upgrading to Ambari 2.4:
https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.0.1/bk_ambari-upgrade/content/upgrading_ambari.html
Upgrading HDP Using Ambari:
https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.0.1/bk_ambari-upgrade/content/upgrading_hdp_stack.html
Upgrading HDP Manually (without Ambari):
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_command-line-upgrade/content/ch_upgrade_2_4.html
2) Review Known Issues in Target Version Release
Review the following items:
Behavioral Changes that will affect applications
● Unsupported features
● Known Issues
● New features added to release
HDP 2.5 Release Notes:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/ch_relnotes_v250.html
HDP 2.5 Known Issues:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/known_issues.html
3) Select Validation Applications
Select two groups of validation applications.
First group: Industrial benchmarks like Teragen &
Terasort, TestDFSIO, Hive TPC-DS, and HBase performance tests. As the minimum
use Teragen & Terasort with multiple mappers for Teragen and multiple
reducers for Terasort.
Second group (optional): User defined validation
applications
. Identify representative applications (together with the input
data) which are being used most often. Be sure to include at least one for
every used Hadoop component like MapReduce, Hive, Pig, HBase, Oozie, Storm,
Kafka and others.
4) Finalize Project Management Items
●
Scope: Identify clusters to be upgraded and components
to upgrade and newly install (if any).
●
HR: Staff upgrade teams. Also, some validation applications can be run by developers
themselves.
●
Time: Identify upgrade tasks, timeline and task owners.
●
QA: Carefully identify validation tasks
●
Risk: Estimate down-time for each cluster upgrade.
●
Resources: Prepare the cluster on which the upgrade will be tested (eg., Dev). When upgrading production clusters
it is strongly recommended to attempt the upgrade first on a test cluster.
**See Also**
HDP Upgrade Best Practices - 2) Do the Upgrade
HDP Upgrade Best Practices - 3) Documentation and Learnings
... View more
09-23-2016
01:36 PM
Thanks @deepak sharma. We're still not on HDP 2.5. Does this apply to HDP 2.3.4 and 2.4.2 or is it only 2.5+? Also, can we connect to secure Solr instance rather than SolrCloud?
... View more
09-23-2016
01:29 PM
My client is using Solr for Ranger audit logs. It appears that enabling Solr results in a Solr instance devoid of any security? What are the recommended paths to secure this particular instance of Solr?
... View more
Labels:
- Labels:
-
Apache Ranger
-
Apache Solr
09-09-2016
01:44 AM
14 Kudos
Disable Transparent Huge Pages (THP) Transparent Huge Pages (THP) is a Linux memory management system that reduces the overhead of Translation Lookaside Buffer (TLB) lookups on machines with large amounts of memory by using larger memory pages. However THP feature is known to perform poorly in Hadoop cluster and results in excessively high CPU utilization. Disable THP to reduce the amount of system CPU utilization on your worker nodes. This can be done by ensuring that both proc entries are set to [never] instead of [always]. Use Recommended File System Types Some file systems offer better performance and stability than others. As such, the HDFS dfs.datanode.data.dir and YARN yarn.nodemanager.local-dirs should be configured to use mount points that are not formatted with the most optimal file systems. Take a look at this article on file system choices: https://community.hortonworks.com/articles/14508/best-practices-linux-file-systems-for-hdfs.html Disable Host Swappiness The Linux kernel provides a tweakable setting that controls how often the swap file is used, called swappiness. A swappiness setting of zero means that the disk will be avoided unless absolutely necessary (when host runs out of memory), while a swappiness setting of 100 means that programs will be swapped to disk almost instantly. Reducing the value for swappiness reduces the likelihood that the Linux kernel will push application memory from memory into swap space. Swap space is much slower than memory as it is backed by disk instead of RAM. Processes that are swapped to disk are likely to experience pauses, which may cause issues and missed SLAs. Add `vm.swappiness=0` to /etc/sysctl.conf and reboot for the change to take effect.
Or you can also change the value while your system is still running `sysctl -w vm.swappiness=0`.
Also clear your swap by running `swapoff -a` and then `swapon -a` as root instead of rebooting to achieve the same effect. Improve Virtual Memory Usage The vm.dirty_background_ratio and vm.dirty_ratio parameters control the percentage of system memory that can be filled with memory pages that still need to be written to disk. Ratios too small force frequent IO operations, and too large leave too much data stored in volatile memory, so optimizing this ration is a careful balance between optimizing IO operations and reducing risk of data loss. Update vm.dirty_background_ratio=20 and vm.dirty_ratio=50 in /etc/sysctl.conf and reboot for the change to take effect, or change the values while your system is still running using `sysctl -p`.
Configure CPUs for Performance Scaling CPU Scaling is configurable and defaults commonly to favor power saving over performance. For Hadoop clusters, it is important that we configure then for better performance over other options. Please set scaling governors to performance, which means running the CPU at maximum frequency. To do so run `cpufreq-set -r -g performance` OR edit /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor and set the content to 'performance'
Tune SSD Configurations SSDs provide great performance boost. If configured optimally for Hadoop workloads, they can provide even better results. Scheduler, read buffers, number of requests etc are the parameters to consider for tuning. Refer following link for further details: https://wiki.archlinux.org/index.php/Solid_State_Drives#I.2FO_Scheduler For all of SSD devices, set following things
echo 'deadline' > {{device}}/queue/scheduler ;
echo '256' > {{device}}/queue/read_ahead_kb ;
echo '256' > {{device}}/queue/nr_requests ; *** You might also be interested in the following articles: *** HDFS Settings for Better Hadoop Performance
... View more
Labels:
09-01-2016
02:13 PM
For Secure Impersonation / proxyusers, is there a way to blacklist certain users so even if they are added to the group, they won’t be allowed to be impersonated?
... View more
09-01-2016
05:51 AM
1 Kudo
Neither. Cloudbreak 2, which will be launched in a few weeks, is the appropriate version to deploy HDP 2.5
... View more
09-01-2016
05:50 AM
2 Kudos
While Cloudbreak 1.3 is available, the first link is more accurate in that it is in Technical Preview. Cloudbreak 2 is expected to be launched in a few weeks. So, if possible, I would wait for that, especially if you're interested in deploying HDP 2.5.
... View more
08-31-2016
02:40 PM
1 Kudo
@sandrine G HDP 2.5 includes both, Hive 1.2.1 and Hive 2.1. However, Hive 2.1 is in technical preview and is not supported. It can be enabled from Ambari if you'd like to give it a try http://hortonworks.com/blog/announcing-apache-hive-2-1-25x-faster-queries-much/
... View more
08-29-2016
04:51 PM
1 Kudo
Can someone please share how to use distcp+oozie (not Falcon) for cluster DR/replication. My understanding is that the entire distcp job will fail if any file in the path is being written to, and the best way around that would be to do the distcp against snapshots. But what is the entire end to end process? Also, what checks can be done on the DR cluster to ensure the success of the job and that the data is synced with the metastore?
... View more
Labels:
- Labels:
-
Apache Hadoop
08-29-2016
04:08 PM
1 Kudo
We have an
application (Datameer) that requires superuser access by being a member in the
HDFS supergroup. What
options are available for securing/restricting that user's access to files and folders on HDFS? With Ranger 0.6+ (HDP 2.5+) we can use Deny or Exclude Conditions (https://cwiki.apache.org/confluence/display/RANGER/Deny-conditions+and+excludes+in+Ranger+policies), but what do we do with previous versions like HDP 2.4 (Ranger 0.5.2)?
... View more
Labels:
- Labels:
-
Apache Ranger