Member since
09-29-2015
286
Posts
601
Kudos Received
60
Solutions
09-24-2018
02:53 PM
Request for @Ancil McBarnett (or anyone else who knows): Please flesh out a little on ... "You do not want Derby in your cluster."
... View more
06-23-2016
12:30 PM
Thank you. Hortonworks doc is very scarce about this. Never would have I guessed such commands without your article. Awesome!
... View more
03-01-2016
02:54 PM
@Ancil: Is this guide still valid for ambari - hdp 2.3 deployment on ec2 please? The description states: "**** Just an Initial Place Holder for an Old KB on Ambari on EC2 to be updated". The manual does not mention the details like "configure nodes - especially /etc/network" and "set up hosts" as we find in this post. Thanks, Sundar
... View more
02-04-2016
07:58 PM
6 Kudos
ISSUE: Choosing the appropriate Linux file system for HDFS deployment SOLUTION: The Hadoop Distributed File System is platform independent and can
function on top of any underlying file system and Operating System.
Linux offers a variety of file system choices, each with caveats that
have an impact on HDFS. As a general best practice, if you are mounting disks solely for Hadoop data, disable ‘noatime’. This speeds up reads for files. There are three Linux file system options that are popular to choose from:
Ext3 Ext4 XFS Yahoo uses the ext3 file system for its Hadoop deployments. ext3 is
also the default filesystem choice for many popular Linux OS flavours.
Since HDFS on ext3 has been publicly tested on Yahoo’s cluster it makes
for a safe choice for the underlying file system. ext4 is the successor to ext3. ext4 has better performance with large
files. ext4 also introduced delayed allocation of data, which adds a
bit more risk with unplanned server outages while decreasing
fragmentation and improving performance. XFS offers better disk space utilization than ext3 and has much
quicker disk formatting times than ext3. This means that it is quicker
to get started with a data node using XFS. Most often performance of a Hadoop cluster will not be constrained by
disk speed – I/O and RAM limitations will be more important. ext3 has
been extensively tested with Hadoop and is currently the stable option
to go with. ext4 and xfs can be considered as well and they give some
performance benefits. References:
http://wiki.apache.org/hadoop/DiskSetup http://hadoop-common.472056.n3.nabble.com/Hadoop-performance-xfs-and-ext4-td742325.html http://www.quora.com/What-are-the-advantages-and-disadvantages-of-the-filesystems-ext2-ext3-ext4-ReiserFS-and-XFS
... View more
Labels:
04-25-2016
07:19 PM
Ancil, I have question regarding: hive.tez.container.size is multiple of yarn.scheduler.minimum-allocation-mb, why so? if yarn.scheduler.maximum-allocation-mb = 24GB, yarn.scheduler.minimum-allocation-mb = 4GB, hive.tez.container.size=5B, would not Yarn smart enough to assign 5GB to a container to satisfy tez needs? Thanks, Richard
... View more
01-21-2016
04:02 PM
5 Kudos
dfs.datanode.max.xcievers / dfs.datanode.max.transfer.threads = 4096 (use 16k if running HBase)
dfs.datanode.balance.max.concurrent.moves = 500 (can go to 1000 if needed)
/* Each data node has a limited bandwidth for rebalancing. The default value for the bandwidth is 5MB/s. In the worst case, each data transfer has a limited
bandwidth of 1MB/s. Default is dfs.datanode.balance.bandwidthPerSec = 5242880 */
dfs.datanode.balance.bandwidthPerSec = 104857600 /* 100 MB/s */
hdfs balancer -Dfs.defaultFS=hdfs://<NN_HOSTNAME>:8020 -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.datanode.balance.max.concurrent.moves=5 -Ddfs.balance.bandwidthPerSec=100000000 -Ddfs.balancer.max-size-to-move=10737418240 -threshold 5
... View more
Labels:
01-16-2016
06:12 PM
4 Kudos
Question: I am about to initiate the cluster install wizard on a new Ambari install. I reviewed the information on service users at http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_ambari_reference_guide/content/_defining_service_users_and_groups_for_a_hd I am wondering whether I should take the "Skip Group Modifications" option. The doc states "Choosing this option is typically required if your
environment manages groups using LDAP and not on the local Linux
machines". In our environment, users and groups are managed via Active
Directory (via Centrify). We are planning to enable security on the cluster after it's
installed, and that will include a host of new users being created,
after which many of the initial users and groups will be orphaned. What does that "Skip group modifications" option actually do? Should it be used in this case? Answer: I believe the answer lies
in the fact that we do a groupmod hadoop statement and there is no group
called hadoop, or this is not allowed in your environment. Since
you will be integrating with LDAP or AD you should use the "Skip Group
Modifications". Upon installing of your Linux nodes references groupds
from LDAP, the groupmod hadoop statement would fail. See http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_Installing_HDP_AMB/content/_customize_services.html "Service Account Users and Groups The service account users and groups are available under the Misc tab. These are the operating system accounts the service components will run as.
If these users do not exist on your hosts, Ambari will automatically
create the users and groups locally on the hosts. If these users already
exist, Ambari will use those accounts. Depending on how your environment is configured, you might not allow groupmod or usermod operations. If this is the case, you must be sure all users and groups are already created and be sure to select the "Skip group modifications" option on the Misc tab. This tells Ambari to not modify group membership for the service users." Also in http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_ambari_troubleshooting/content/_resolving_cluster_install_and_configuration_problems.html "3.7. Problem: Cluster Install Fails with Groupmod Error The
cluster fails to install with an error related to running groupmod.
This can occur in environments where groups are managed in LDAP, and not
on local Linux machines. You may see an error message similar to the
following one: Fail: Execution of 'groupmod hadoop' returned 10. groupmod: group 'hadoop' does not exist in /etc/group 3.7.1. Solution When
installing the cluster using the Cluster Installer Wizard, at
the Customize Services step, select the Misc tab and choose the Skip
group modifications during install option."
... View more
Labels:
05-13-2017
05:08 PM
This help me a lot thank you ! I just needed to add --allow-root.
... View more
03-02-2016
08:01 PM
Please note, this tutorial is deprecated with HDP 2.4 release.
... View more
- « Previous
-
- 1
- 2
- Next »