About xyao

VidyaSargur · ‎08-08-2022

@husseljo, as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.

maheshchimmiri · ‎04-12-2020

Below is the very good article for differences between hadoop 2.x and Hadoop 3.x Difference Between Hadoop 2 and Hadoop 3

xyao · ‎02-01-2018

Can you check you hadoop.kms.authentication.kerberos.name.rules settings from kms-site.xml? Try "DEFAULT" if you have a customized setting that is invalid. You mentioned that the kms principle is changed. Can you also post your hadoop.kms.authentication.kerberos.principal and hadoop.security.auth_to_local settings from core-site.xml?

xyao · ‎01-02-2018

@Prateek Behera The disk quota in your case works fine as expected. HDFS by default has a replication factor of 3 as you can see in the 3rd column of your CLI output. 500MB *3 (replication factor) = 1.5 GB > 1GB (quota).

xyao · ‎08-27-2018

@Daniel Muller, can you grep "Safe mode is" from hdfs namenode log? That will tell the reason why namenode does not exit safemode directly.

Micael · ‎11-03-2017

Thank you @Xiaoyu Yao it works!

xyao · ‎10-05-2018

These seems to be bogus replay exception when running solr service. Changes hadoop-env.sh or solr JVM option with -Dsun.security.krb5.rcache=none should fix the problem. # # Extra Java runtime options. Empty by default. export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.rcache=none ${HADOOP_OPTS}"

zhouyf_zz · ‎06-07-2017

clear /etc/resolv.conf I think the problem is resolved dns。 @Silvio del Val , clear /etc/resolv.conf I think the problem is resolved dns。 @Silvio del Val

xyao · ‎07-06-2016

We have seem many incidents of overloaded HDFS namenode due to 1) misconfigurations or 2) “bad” MR jobs or Hive queries that create large amount of RPC requests in a short period of time. There are quite a few features that have been introduced in HDP 2.3/2.4 to protect HDFS namenode. This article summarize the deployment steps of these features with an incomplete list of known issues and possible solutions for them. Enable Async Audit Logging Dedicated Service RPC Port Dedicated Lifeline RPC Port for HA Enable FairCallQueue on Client RPC Port Enable RPC Client Backoff on Client RPC port Enable RPC Caller Context to track the “bad” jobs Enable Response time based backoff with DecayedRpcScheduler Check JMX for namenode client RPC call queue length and average queue time Check JMX for namenode DecayRpcScheduler when FCQ is enabled NNtop (HDFS-6982) 1. Enable Async Audit Logging Enable async audit logging by setting "dfs.namenode.audit.log.async" to true in hdfs-site.xml. This can minimize the impact of audit log I/Os on namenode performance. <property> <name>dfs.namenode.audit.log.async</name> <value>true</value> </property> 2. Dedicated Service RPC Port Configuring a separate service RPC port can improve the responsiveness of the NameNode by allowing DataNode and client requests to be processed via separate RPC queues. Datanode and all other services should be connected to the new service RPC address and clients connect to the well known addresses specified by dfs.namenode.rpc-address. Adding a service RPC port to an HA cluster with automatic failover via ZKFCs (with/wo Kerberos) requires some additional steps as follows: Add the following settings to hdfs-site.xml. <property> <name>dfs.namenode.servicerpc-address.mycluster.nn1</name> <value>nn1.example.com:8040</value> </property> <property> <name>dfs.namenode.servicerpc-address.mycluster.nn2</name> <value>nn2.example.com:8040</value> </property> 2. If the cluster is not Kerberos enabled, skip this step. If the cluster is kerberos enabled, create two new hdfs_jass.conf files for nn1 and nn2 and copy them to /etc/hadoop/conf/hdfs_jaas.conf, respectively nn1: Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true useTicketCache=false keyTab="/etc/security/keytabs/nn.service.keytab" principal="nn/c6401.ambari.apache.org@EXAMPLE.COM";}; nn2: Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true useTicketCache=false keyTab="/etc/security/keytabs/nn.service.keytab" principal="nn/c6402.ambari.apache.org@EXAMPLE.COM";}; Add the following to hadoop-env.sh export HADOOP_NAMENODE_OPTS="-Dzookeeper.sasl.client=true -Dzookeeper.sasl.client.username=zookeeper -Djava.security.auth.login.config=/etc/hadoop/conf/hdfs_jaas.conf -Dzookeeper.sasl.clientconfig=Client ${HADOOP_NAMENODE_OPTS}" 3. Restart NameNodes 4. Restart DataNodes to connect to the new NameNode service RPC port instead of the NameNode client RPC port . 5. Stop the ZKFC processes on both NameNodes 6. Run the following command to reset the ZKFC state in ZooKeeper hdfs zkfc -formatZK Known issues: 1. Without step 6 you will see the following exception after ZKFC restart. java.lang.RuntimeException:Mismatched address stored in ZK forNameNode 2. Without step 2 in a Kerberos enabled HA cluster, you will see the following exception when running step 6. 16/03/23 03:30:53 INFO ha.ActiveStandbyElector: Recursively deleting /hadoop-ha/hdp64ha from ZK...16/03/23 03:30:53 ERROR ha.ZKFailoverController: Unable to clear zk parent znodejava.io.IOException: Couldn't clear parent znode /hadoop-ha/hdp64haat org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:380)at org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:267)at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:212)at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:61)at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:172)at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:168)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:360)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:442)at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:168)at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:183) Caused by: org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory not empty for /hadoop-ha/hdp64haat org.apache.zookeeper.KeeperException.create(KeeperException.java:125)at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:54)at org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:375)at org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:372)at org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1041)at org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:372) ... 11 more 3. Dedicated Lifeline RPC Port for HA HDFS-9311 allows using a separate RPC address to isolate health checks and liveness from client RPC port which could be exhausted due to “bad” jobs. Here is an example to configure this feature in a HA cluster. <property> <name>dfs.namenode.lifeline.rpc-address.mycluster.nn1</name> <value>nn1.example.com:8050</value> </property> <property> <name>dfs.namenode.lifeline.rpc-address.mycluster.nn2</name> <value>nn1.example.com:8050</value> </property>

Online	Offline
Last Visited	‎05-16-2021 03:05 PM

Member Since	‎09-28-2015 04:06 PM
Last Visited	‎05-16-2021 03:05 PM
Posts	51
Kudos received	32

Cloudera Community

Re: Non DFS Utilization shows same post cleanup of...

Re: can't be moved from encryption zone

Re: how to set and change the number of parallel t...

Re: Getting Error: ReplicaNotFoundException: Canno...

Re: how to Find Block Locations of a file in HDFS ...

Re: how to Find Block Locations of a file in HDFS ...

Re: What is the differences between Hadoop 1, Hado...

Re: kerberos: Authentication failed, status: 404, ...

Re: HDFS dsquota diskspace consumed calculation

Re: HDFS NameNode won't leave safemode

Re: Force closing a HDFS file still open (because ...

Re: Solr "Request is a replay" (Ambari Infra Solr ...

Re: Very slow hdfs command responses in cluster me...

HDFS Namenode Protection Checklist