Member since
10-22-2015
83
Posts
84
Kudos Received
13
Solutions
08-27-2018
08:30 PM
The article doesn't indicate this, so for reference, the listed HDFS settings do not exist by default. These settings, as shown below, need to go into hdfs-site.xml, which is done in Ambari by adding fields under "Custom hdfs-site". dfs.namenode.rpc-bind-host=0.0.0.0 dfs.namenode.servicerpc-bind-host=0.0.0.0 dfs.namenode.http-bind-host=0.0.0.0 dfs.namenode.https-bind-host=0.0.0.0 Additionally, I found that after making this change, both NameNodes under HA came up as stand-by; the article at https://community.hortonworks.com/articles/2307/adding-a-service-rpc-port-to-an-existing-ha-cluste.html got me the missing step of running a ZK format. I have not tested the steps below against a Production cluster and if you foolishly choose to follow these steps, you do so at a very large degree of risk (you could lose all of the data in your cluster). That said, this worked for me in a non-Prod environment: 01) Note the Active NameNode. 02) In Ambari, stop ALL services except for ZooKeeper. 03) In Ambari, make the indicated changes to HDFS. 04) Get to the command line on the Active NameNode (see Step 1 above). 05) At the command line you opened in Step 4, run: `sudo -u hdfs hdfs zkfc -formatZK` 06) Start the JournalNodes. 07) Start the zKFCs. 08) Start the NameNodes, which should come up as Active and Standby. If they don't, you're on your own (see the "high risk" caveat above). 09) Start the DataNodes. 10) Restart / Refresh any remaining HDFS components which have stale configs. 11) Start the remaining cluster services. It would be great if HWX could vet my procedure and update the article accordingly (hint, hint).
... View more
02-13-2016
10:16 AM
1 Kudo
e know Hadoop is used in clustered environment where we have clusters, each cluster will have multiple racks, each rack will have multiple datanodes. So to make HDFS fault tolerant in your cluster you need to consider following failures- DataNode failure Rack failure Chances of Cluster failure is fairly low so let not think about it. In the above cases you need to make sure that - If one DataNode fails, you can get the same data from another DataNode If the entire Rack fails, you can get the same data from another Rack So thats why I think default replication factor is set to 3, so that not 2 replica goes to same DataNode and at-least 1 replica goes to different Rack to fulfill the above mentioned Fault-Tolerant criteria. Hope this will help.
... View more
09-05-2018
04:23 AM
I have a 4-node cluster and this did not work for me. same error: /bucket_00003 could only be written to 0 of the 1 minReplication nodes. There are 4 datanode(s) running and no node(s) are excluded in this operation.
... View more
11-20-2015
06:44 AM
1 Kudo
Note: Credit for the
key piece of information to solve this problem goes to Phil D’Amore. A customer had a problem writing a Java application that
used the Hive client libraries to talk to two secure Hadoop clusters that
resided in different Kerberos realms. The same problem could be encountered by a client connecting to a single
secure Hadoop cluster that happened not to be in the Kerberos “default_realm”
as specified in the client host’s krb5.conf file. The same problem could occur for any Hadoop
ecosystem client, not just Hive clients. In order to communicate with two different secure Hadoop
clusters, in different Kerberos realms, the client application did the
following things correctly: It harvested the needed configuration files (in
this case, core-site.xml, hdfs-site.xml, and hive-site.xml) from each target
cluster, and used the appropriate configuration when communicating with each
respective cluster. Its application user id had two Kerberos
principals, one registered and authenticated with each of the two KDCs, and
used the appropriate principal when authenticating to each respective cluster. On the client host, it had a krb5.conf file that
correctly specified Kerberos kdc and admin_server values for each of the two target
realms in the [realms] section, and set one of the realms as the “default_realm”
in the [libdefaults] section. (It could
also have set a third realm as the default_realm, it would just mean that both
target clusters would be in non-default realms, which is also fine.) However, when they ran the application, they had a puzzling
problem: They were able to authenticate
to the target cluster in the default realm, but failed with the target cluster
in the non-default realm. Indeed, after
the failure they found logs in the default_realm KDC that showed an incorrect
attempt to authenticate to the wrong
KDC. They knew they had not made a coding error, because changing
the default_realm to the other target cluster caused the situation to
reverse. Depending on the setting of
default_realm in krb5.conf file, they could talk to either cluster, but not both
at once. The problem was fixed by adding a [domain_realm] section to
the krb5.conf file. It turns out that the
Thrift libraries underlying the client have APIs that do not communicate the
target “realm”, but only the target server. The Kerberos libraries are responsible for translating from the target
server’s domain to the target realm. If
the domain and the realm have identical string values (except for upper/lower
case), which is common but not required, it will use that. Failing that, it will use the default
realm. It will not infer from the domain of the KDC servers. In this case the
domain and realm were different, so the authentication request for the
non-default realm was being sent to the default realm’s KDC. Adding a [domain_realm] section to the
krb5.conf file allows arbitrary mappings from target domains to target realms,
so Kerberos was finally able to translate from the desired target domain to the correct
target realm. See http://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#domain-realm
for details of the krb5.conf file sections and contents.
... View more
Labels: