Member since
09-28-2015
51
Posts
32
Kudos Received
17
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1423 | 04-13-2018 11:36 PM | |
3933 | 04-13-2018 11:03 PM | |
1188 | 04-13-2018 10:56 PM | |
3683 | 04-10-2018 03:12 PM | |
4694 | 02-13-2018 07:23 PM |
08-08-2022
03:17 AM
@husseljo, as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.
... View more
04-12-2020
05:25 AM
Below is the very good article for differences between hadoop 2.x and Hadoop 3.x Difference Between Hadoop 2 and Hadoop 3
... View more
02-01-2018
04:17 PM
Can you check you hadoop.kms.authentication.kerberos.name.rules settings from kms-site.xml? Try "DEFAULT" if you have a customized setting that is invalid. You mentioned that the kms principle is changed. Can you also post your hadoop.kms.authentication.kerberos.principal and hadoop.security.auth_to_local settings from core-site.xml?
... View more
01-02-2018
06:13 PM
1 Kudo
@Prateek Behera The disk quota in your case works fine as expected. HDFS by default has a replication factor of 3 as you can see in the 3rd column of your CLI output. 500MB *3 (replication factor) = 1.5 GB > 1GB (quota).
... View more
08-27-2018
06:14 PM
@Daniel Muller, can you grep "Safe mode is" from hdfs namenode log? That will tell the reason why namenode does not exit safemode directly.
... View more
11-03-2017
08:37 AM
Thank you @Xiaoyu Yao it works!
... View more
10-05-2018
12:10 AM
1 Kudo
These seems to be bogus replay exception when running solr service. Changes hadoop-env.sh or solr JVM option with -Dsun.security.krb5.rcache=none should fix the problem. # # Extra Java runtime options. Empty by default.
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.rcache=none ${HADOOP_OPTS}"
... View more
06-07-2017
09:19 AM
1 Kudo
clear /etc/resolv.conf
I think the problem is resolved dns。
@Silvio del Val , clear /etc/resolv.conf I think the problem is resolved dns。 @Silvio del Val
... View more
07-06-2016
08:40 PM
4 Kudos
We have seem many incidents of overloaded HDFS namenode due to 1) misconfigurations or 2) “bad” MR jobs or Hive queries that create large amount of RPC requests in a short period of time. There are quite a few features that have been introduced in HDP 2.3/2.4 to protect HDFS namenode. This article summarize the deployment steps of these features with an incomplete list of known issues and possible solutions for them.
Enable Async Audit Logging Dedicated Service RPC Port Dedicated Lifeline RPC Port for HA Enable FairCallQueue on Client RPC Port Enable RPC Client Backoff on Client RPC port Enable RPC Caller Context to track the “bad” jobs Enable Response time based backoff with DecayedRpcScheduler Check JMX for namenode client RPC call queue length and average queue time Check JMX for namenode DecayRpcScheduler when FCQ is enabled NNtop (HDFS-6982) 1. Enable Async Audit Logging Enable async audit logging by setting "dfs.namenode.audit.log.async" to true in hdfs-site.xml. This can minimize the impact of audit log I/Os on namenode performance. <property>
<name>dfs.namenode.audit.log.async</name>
<value>true</value>
</property> 2. Dedicated Service RPC Port Configuring a separate service RPC port can improve the responsiveness of the NameNode by allowing DataNode and client requests to be processed via separate RPC queues. Datanode and all other services should be connected to the new service RPC address and clients connect to the well known addresses specified by dfs.namenode.rpc-address. Adding a service RPC port to an HA cluster with automatic failover via ZKFCs (with/wo Kerberos) requires some additional steps as follows: Add the following settings to hdfs-site.xml. <property>
<name>dfs.namenode.servicerpc-address.mycluster.nn1</name>
<value>nn1.example.com:8040</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.mycluster.nn2</name>
<value>nn2.example.com:8040</value>
</property> 2. If the cluster is not Kerberos enabled, skip this step. If the cluster is kerberos enabled, create two new hdfs_jass.conf files for nn1 and nn2 and copy them to /etc/hadoop/conf/hdfs_jaas.conf, respectively nn1: Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true useTicketCache=false keyTab="/etc/security/keytabs/nn.service.keytab" principal="nn/c6401.ambari.apache.org@EXAMPLE.COM";}; nn2: Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true useTicketCache=false keyTab="/etc/security/keytabs/nn.service.keytab" principal="nn/c6402.ambari.apache.org@EXAMPLE.COM";}; Add the following to hadoop-env.sh export HADOOP_NAMENODE_OPTS="-Dzookeeper.sasl.client=true -Dzookeeper.sasl.client.username=zookeeper -Djava.security.auth.login.config=/etc/hadoop/conf/hdfs_jaas.conf -Dzookeeper.sasl.clientconfig=Client ${HADOOP_NAMENODE_OPTS}" 3. Restart NameNodes 4. Restart DataNodes to connect to the new NameNode service RPC port instead of the NameNode client RPC port . 5. Stop the ZKFC processes on both NameNodes 6. Run the following command to reset the ZKFC state in ZooKeeper hdfs zkfc -formatZK Known issues: 1. Without step 6 you will see the following exception after ZKFC restart. java.lang.RuntimeException:Mismatched address stored in ZK forNameNode 2. Without step 2 in a Kerberos enabled HA cluster, you will see the following exception when running step 6. 16/03/23 03:30:53 INFO ha.ActiveStandbyElector: Recursively deleting /hadoop-ha/hdp64ha from ZK...16/03/23 03:30:53 ERROR ha.ZKFailoverController: Unable to clear zk parent znodejava.io.IOException: Couldn't clear parent znode /hadoop-ha/hdp64haat org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:380)at org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:267)at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:212)at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:61)at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:172)at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:168)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:360)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:442)at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:168)at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:183)
Caused by: org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory not empty for /hadoop-ha/hdp64haat org.apache.zookeeper.KeeperException.create(KeeperException.java:125)at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:54)at org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:375)at org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:372)at org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1041)at org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:372)
... 11 more 3. Dedicated Lifeline RPC Port for HA HDFS-9311 allows using a separate RPC address to isolate health checks and liveness from client RPC port which could be exhausted due to “bad” jobs. Here is an example to configure this feature in a HA cluster. <property>
<name>dfs.namenode.lifeline.rpc-address.mycluster.nn1</name>
<value>nn1.example.com:8050</value>
</property>
<property>
<name>dfs.namenode.lifeline.rpc-address.mycluster.nn2</name>
<value>nn1.example.com:8050</value>
</property>
... View more
Labels: