Member since
07-30-2019
111
Posts
186
Kudos Received
35
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3936 | 02-07-2018 07:12 PM | |
3177 | 10-27-2017 06:16 PM | |
3241 | 10-13-2017 10:30 PM | |
5724 | 10-12-2017 10:09 PM | |
1625 | 06-29-2017 10:19 PM |
07-25-2017
03:41 AM
I ran into same issue but it's automatically fixed after re-starting my data node server (re-boot physical linux server).
... View more
07-26-2016
12:16 AM
1 Kudo
You may run into slow Hadoop service start on your OS X development laptop. You can check this by opening up your service logs and looking for large (5-10 second) gaps between successive log entries at startup. Diagnosis It often manifests as test failures for MiniDFSCluster-based tests that use short timeouts (<10 seconds). Here is an example from a NameNode log file with a 5 second stall at startup. 2016-07-25 14:57:37,982 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2016-07-25 14:57:43,060 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). Another 5 second stall during NameNode startup. 2016-07-25 14:57:48,790 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Append Enabled: true
2016-07-25 14:57:53,914 INFO org.apache.hadoop.util.GSet: Computing capacity for map INodeMap Resolution If you see this behavior you are likely running into an OS X bug. The fix is to put all your entries for localhost on one line as described in this StackOverflow answer. i.e. Make sure your /etc/hosts file has something like this: # Replace myhostname with the hostname of your laptop.
#
127.0.0.1 localhost myhostname myhostname.local myhostname.Home Instead of this: 127.0.0.1 localhost myhostname.local
127.0.0.1 myhostname myhostname.Home
Root Cause The root cause of this problem appears to be a long delay when looking up the local host name with InetAddress.getLocalHost. The following code is a minimal repro of this problem on affected systems. import java.net.*;
class Lookup {
public static void main(String[] args) throws Exception {
System.out.println(InetAddress.getLocalHost().getCanonicalHostName());
}
} This program can take over 5 seconds to execute on an affected machine. Verified on OS X 10.10.5 with Oracle JDK 1.8.0_91 and 1.7.0_79.
... View more
Labels:
06-23-2017
04:29 PM
Thanks for the the very useful article. Will there be any follow ups coming? I am particularly interested in the changes that came about from https://issues.apache.org/jira/browse/HDFS-8818 and how settings like dfs.balancer.moverThreads needs to be increased from default when balancing a large number of unbalanced nodes. (e.g the setting used in the comment in HDFS-8188 herehttps://issues.apache.org/jira/browse/HDFS-8818?focusedCommentId=15997429&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15997429)
... View more
11-08-2018
06:28 PM
Hi @Arpit Agarwal, In HADOOP-10597 the earliest version mentioned with the fix is 2.7.4. Is the HDP version recommended in the post correct? I'm running HDP 2.6.1 with Hadoop 2.7.3 and I would like to confirm if this parameter can be enabled or not. Thanks!
... View more
07-07-2016
04:13 AM
16 Kudos
Introduction
This article continues where part 1 left off. It describes a few more configuration settings that can be enabled in CDP, HDP, CDH, or Apache Hadoop clusters to help the NameNode scale better.
Audience
This article is for Hadoop administrators who are familiar with HDFS and its components. If you are using Ambari or Cloudera Manager you should know how to manage services and configurations. It is assumed that you have read part 1 of the article.
RPC Handler Count
The Hadoop RPC server consists of a single RPC queue per port and multiple handler (worker) threads that dequeue and process requests. If the number of handlers is insufficient, then the RPC queue starts building up and eventually overflows. You may start seeing task failures and eventually job failures and unhappy users.
It is recommended that the RPC handler count be set to 20 * log2(Cluster Size) with an upper limit of 200.
e.g. for a 250 node cluster you should initialize this to 20 * log2(250) = 160. The RPC handler count can be configured with the following setting in hdfs-site.xml.
<property>
<name>dfs.namenode.handler.count</name>
<value>160</value>
</property>
This heuristic is from the excellent Hadoop Operations book. If you are using Ambari to manage your cluster this setting can be changed via a slider in the Ambari Server Web UI. If you're using Cloudera Manager you can search for property name "dfs.namenode.service.handler.count" under the HDFS configuration page and adjust the value.
Service RPC Handler Count
Pre-requisite: If you have not enabled the Service RPC port already, please do so first as described here.
There is no precise calculation for the Service RPC handler count however the default value of 10 is too low for most production clusters. We have often seen this initialized to 50% of the dfs.namenode.handler.count in busy clusters and this value works well in practice.
e.g. for the same 250 node cluster you would initialize the service RPC handler count with the following setting in hdfs-site.xml.
<property>
<name>dfs.namenode.service.handler.count</name>
<value>80</value>
</property>
DataNode Lifeline Protocol
The Lifeline protocol is a feature recently added by the Apache Hadoop Community (see Apache HDFS Jira HDFS-9239). It introduces a new lightweight RPC message that is used by the DataNodes to report their health to the NameNode. It was developed in response to problems seen in some overloaded clusters where the NameNode was too busy to process heartbeats and spuriously marked DataNodes as dead.
For a non-HA cluster, the feature can be enabled with the following setting in hdfs-site.xml (replace mynamenode.example.com with the hostname or IP address of your NameNode). The port number can be different too.
<property>
<name>dfs.namenode.lifeline.rpc-address</name>
<value>mynamenode.example.com:8050</value>
</property>
For an HA cluster, the lifeline RPC port can be enabled with settings like the following, replacing mycluster, nn1 and nn2 appropriately.
<property>
<name>dfs.namenode.lifeline.rpc-address.mycluster.nn1</name>
<value>mynamenode1.example.com:8050</value>
</property>
<property>
<name>dfs.namenode.lifeline.rpc-address.mycluster.nn2</name>
<value>mynamenode2.example.com:8050</value>
</property>
Additional lifeline protocol settings are documented in the HDFS-9239 release note but these can be left at their default values for most clusters.
Note: Changing the lifeline protocol settings requires a restart of the NameNodes, DataNodes and ZooKeeper Failover Controllers to take full effect. If you have NameNode HA setup, you can restart the NameNodes one at a time followed by a rolling restart of the remaining components to avoid cluster downtime.
Conclusion
That is it for Part 2. In Part 3 of this article, we will explore how to enable two new HDFS features to help your NameNode scale better.
... View more
Labels:
10-04-2018
09:53 AM
@Elias Abacioglu You can refer below guidance for configuring service port. https://community.hortonworks.com/articles/223817/how-do-you-enable-namenode-service-rpc-port-withou.html
... View more
06-20-2016
09:55 AM
Option 1: reformat: you will need not only to "copyFromLocal" but also recreate the file system. See for example this for details. Option 2: Exit safe mode and find out where you are. I'd recommend this one. You can also find out what caused the trouble, maybe all corrupted blocks are on a bad disk or something like that. You can share the list of files you are uncertain whether to restore them or not.
... View more
08-11-2016
12:52 PM
Hi @Arpit Agarwal,
That is my understanding as well. Thanks for a short and to the point answer.
... View more
11-05-2017
08:52 AM
@Kuldeep Kulkarni, Hi, I have tried the same. Even after installing the kerberos client Manuallly, I get the same error. not sure, why the Test Kerberos client fails, I need to skip that and go to second page. * All the hosts got the Kerberos client installed. Kerberos Clients 3 Kerberos Clients Installed 1.jpg
... View more
- « Previous
-
- 1
- 2
- Next »