Member since
07-30-2019
111
Posts
181
Kudos Received
35
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1858 | 02-07-2018 07:12 PM | |
1303 | 10-27-2017 06:16 PM | |
1782 | 10-13-2017 10:30 PM | |
3357 | 10-12-2017 10:09 PM | |
702 | 06-29-2017 10:19 PM |
09-08-2021
02:43 PM
This whole series is really insightful and helpful!
... View more
03-22-2021
02:43 AM
Hi @Priya09, as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.
... View more
03-02-2021
09:09 AM
Hello @kolli_sandeep , it seems the failover controllers are down in the cluster. Please follow the steps here [1] and start the Failover Controller roles which will transition the NameNdoes to Active/Standby state. You need to follow below steps; Stop the FailoverController Roles under HDFS > Instances page Remove the HA state from ZK. On a ZooKeeper server host, run zookeeper-client. Execute the following to remove the configured nameservice. This example assumes the name of the nameservice is nameservice1. You can identify the nameservice from the Federation and High Availability section on the HDFS Instances tab: rmr /hadoop-ha/nameservice1 (If you don't see any znode /hadoop-ha in ZK znode list, skip the step) After removing the HA znode in ZK, Go to CM and Click the HDFS > Instances > Federation and High Availability > Actions Under Actions menu, Select Actions > Initialize High Availability State in ZooKeeper . Then start the Failover Controllers role ( CM > Instances > Select FailoverControllers > Actions for selected > Start) Verify the NameNdoe State and if you don't see the active/standby state of NN, If any failure, just Restart the HDFS service [1] https://docs.cloudera.com/documentation/enterprise/latest/topics/cdh_hag_hdfs_ha_enabling.html
... View more
08-02-2019
09:05 AM
Thanks, to be on the same page taking help of below scenario: hdfs snapshottable location /a/b/ has a file c which is snapshotted. Consider a scenario where c is deleted from hdfs using cli hdfs -rm -r -skipTrash (NN transaction happened and hdfs cli command doesn't show up the file anymore) and then a new file is created with same content/size and name. - What gets stored in hdfs? whats the delta that snapshot add in hdfs in this case? --> is it just that snapshot still holds c as block in hdfs in addition to the same file that was created in hdfs --> NN resource used to maintain both of their metadata in heap? is this all or there is more to it . Regards
... View more
02-27-2018
10:20 PM
2 Kudos
Building Apache Tez with Apache Hadoop 2.8.0 or later fails due to client/server jar separation in Hadoop [1]. The build fails with the following error: [ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /src/tez/tez-api/src/test/java/org/apache/tez/client/TestTezClientUtils.java:[48,30] cannot find symbol
symbol: class DistributedFileSystem
location: package org.apache.hadoop.hdfs
[ERROR] /src/tez/tez-api/src/test/java/org/apache/tez/client/TestTezClientUtils.java:[680,50] cannot find symbol
symbol: class DistributedFileSystem
location: class org.apache.tez.client.TestTezClientUtils
[ERROR] /src/tez/ecosystem/tez/tez-api/src/test/java/org/apache/tez/common/TestTezCommonUtils.java:[62,42] cannot access org.apache.hadoop.hdfs.DistributedFileSystem To get Tez to compile successfully, you will need to use the new hadoop28 profile introduced by TEZ-3690 [2]. E.g. here is how you compile Tez with Apache Hadoop 3.0.0: mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true -Phadoop28 -Dhadoop.version=3.0.0 References: 1. HDFS-6200: Create a separate jar for hdfs-client 2. TEZ-3690: Tez on hadoop 3 build failed due to hdfs client/server jar separation.
... View more
- Find more articles tagged with:
- Hadoop Core
- hadoop-ecosystem
- Issue Resolution
- issue-resolution
- tez
Labels:
06-22-2017
07:46 PM
7 Kudos
The HDFS NameNode ensures that each block is sufficiently replicated. When it detects the loss of a DataNode, it instructs remaining nodes to maintain adequate replication by creating additional block replicas.
For each lost replica, the NameNode picks a (source, destination) pair where the source is an available DataNode with another replica of the block and the destination is the target for the new replica. The re-replication work can be massively parallelized in large clusters since the replica distribution is randomized.
In this article, we estimate a lower bound for the recovery time. Simplifying Assumptions
The maximum IO bandwidth of each disk is 100MB/s (reads + writes). This is true for the vast majority of clusters that use spinning disks. The aggregate IO capacity of the cluster is limited by disk and not the network. This is not always true but helps us establish lower bounds without discussing network topologies. Block replicas are uniformly distributed across the cluster and disk usage is uniform. True if the HDFS balancer was run recently.
Theoretical Lower Bound
Let's assume the cluster has n nodes. Each each node has p disks, and the usage of each disk is c TeraBytes. The data usage of each node is thus (p ⋅ c) TB.
The amount of data data transfer needed for recovery is twice the capacity of the lost DataNode as each replica must be read once from a source disk and written once to the target disk.
Data transfer during recovery = 2 ⋅ (Node Capacity)
= (2 ⋅ p ⋅ c) TB
= (2 ⋅ p ⋅ c ⋅ 1,000,000) MB
The re-replication rate is the limited by the available aggregate IO bandwidth in the cluster: Cluster aggregate IO bandwidth = (Disk IO bandwidth) ⋅ (Number of disks)
= (100 ⋅ n ⋅ p) MB/s Thus Minimum Recovery Time = (Data transfer during recovery) / (Cluster aggregate IO bandwidth)
= (2 ⋅ p ⋅ c ⋅ 1,000,000) / (100 ⋅ n ⋅ p)
= (20,000 ⋅ c/n) seconds.
where: c = Mean usage of each disk in TB.
n = Number of DataNodes in the cluster. This is the absolute best case with no other load, no network bandwidth limits, and a perfectly efficient scheduler.
E.g. In a 100 node cluster where each disk has 4TB of data, recovery from the loss of a DataNode must take at least (20,000 ⋅ 4) / 100 = 800 seconds or approximately 13 minutes.
Clearly, the cluster size bounds the recovery time. Disk capacities being equal, a 1000 node cluster can recover 10x faster than a 100 node cluster. A More Practical Lower Bound
The theoretical lower bound assumes that block re-replications can be instantaneously scheduled across the cluster. It also assumes that all cluster IO capacity is available for re-replication whereas in practice application reads and writes also consume IO capacity. The NameNode schedules 2 outbound replication streams per DataNode, per heartbeat interval to throttle re-replication traffic. This throttle allows DataNodes to remain responsive to applications. The throttle can be adjusted via the configuration setting dfs.namenode.replication.max-streams. Let's call this m and the heartbeat interval h.
Also let's assume the mean block size in the cluster is b MB. Then: Re-replication Rate = Blocks Replicated cluster-wide per heartbeat interval
= (n ⋅ m/h) Blocks/s
The total number of blocks to be re-replicated is the capacity of the lost node divided by the mean block size. Number of Blocks Lost = (p ⋅ c) TB / b MB
= (p ⋅ c ⋅ 1,000,000/b).
Thus:
Recovery Time = (Number of Blocks Lost) / (Re-replication Rate)
= (p ⋅ c ⋅ 1,000,000) / (b ⋅ n ⋅ m/h)
= (p ⋅ c ⋅ h ⋅ 1,000,000) / (b ⋅ n ⋅ m) seconds.
where: p = Number of disks per node.
c = Mean usage of each disk in TB.
h = Heartbeat interval (default = 3 seconds).
b = Mean block size in MB.
n = Number of DataNodes in the cluster.
m = dfs.namenode.replication.max-streams (default = 2)
Simplifying by plugging in the defaults for h and m, we get
Minimum Recovery Time (seconds) = (p ⋅ c ⋅ 1,500,000) / (b ⋅ n)
E.g. in the same cluster, assuming the mean block size is 128MB and each node has 8 disks, the practical lower bound on recovery time will be 3,750 seconds or ~1 hour. Reducing the Recovery Time
The recovery time can be reduced by:
Increasing dfs.namenode.replication.max-streams. However, setting this value too high can affect cluster performance. Note that increasing this value beyond 4 must be evaluated carefully and also requires changing the safeguard upper limit via dfs.namenode.replication.max-streams-hard-limit.
Using more nodes with smaller disks. Total cluster capacity remaining the same, a cluster with more nodes and smaller disks will recover faster.
Avoiding predominantly small blocks.
... View more
- Find more articles tagged with:
- datanode
- FAQ
- Hadoop Core
- HDFS
- recovery
- replication
Labels:
04-11-2017
08:18 PM
@hardik desai The NameNode appears to be up from your screenshot, so it's difficult to say what went wrong. Check for errors in the service logs of an affected service instance. Also look through your NameNode logs for any errors.
... View more
04-12-2017
06:05 PM
1 Kudo
ok then let me ask this , what do companies to do backup this directory ? do they backup more often than others to ensure loss of data is minimum ?
... View more
04-05-2017
01:46 PM
I'd also post this question on the Ambari track to check why Ambari didn't detect the DataNodes doing down. Also from your logs it is hard to say why the DataNode went down. I again recommend increasing the DataNode heap allocation via Ambari. Also check that your nodes are provisioned with sufficient amount of RAM.
... View more
06-20-2018
04:54 PM
This worked.
... View more
05-30-2019
04:06 PM
Could anyone please tell how can I get the core-site.xml, hdfs-site.xml. I am building a gradle project in which I have to create a directory on hdfs which is on a remote server.
... View more
07-25-2017
03:41 AM
I ran into same issue but it's automatically fixed after re-starting my data node server (re-boot physical linux server).
... View more
07-26-2016
12:16 AM
1 Kudo
You may run into slow Hadoop service start on your OS X development laptop. You can check this by opening up your service logs and looking for large (5-10 second) gaps between successive log entries at startup. Diagnosis It often manifests as test failures for MiniDFSCluster-based tests that use short timeouts (<10 seconds). Here is an example from a NameNode log file with a 5 second stall at startup. 2016-07-25 14:57:37,982 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2016-07-25 14:57:43,060 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). Another 5 second stall during NameNode startup. 2016-07-25 14:57:48,790 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Append Enabled: true
2016-07-25 14:57:53,914 INFO org.apache.hadoop.util.GSet: Computing capacity for map INodeMap Resolution If you see this behavior you are likely running into an OS X bug. The fix is to put all your entries for localhost on one line as described in this StackOverflow answer. i.e. Make sure your /etc/hosts file has something like this: # Replace myhostname with the hostname of your laptop.
#
127.0.0.1 localhost myhostname myhostname.local myhostname.Home Instead of this: 127.0.0.1 localhost myhostname.local
127.0.0.1 myhostname myhostname.Home
Root Cause The root cause of this problem appears to be a long delay when looking up the local host name with InetAddress.getLocalHost. The following code is a minimal repro of this problem on affected systems. import java.net.*;
class Lookup {
public static void main(String[] args) throws Exception {
System.out.println(InetAddress.getLocalHost().getCanonicalHostName());
}
} This program can take over 5 seconds to execute on an affected machine. Verified on OS X 10.10.5 with Oracle JDK 1.8.0_91 and 1.7.0_79.
... View more
- Find more articles tagged with:
- Cloud & Operations
Labels:
06-23-2017
04:29 PM
Thanks for the the very useful article. Will there be any follow ups coming? I am particularly interested in the changes that came about from https://issues.apache.org/jira/browse/HDFS-8818 and how settings like dfs.balancer.moverThreads needs to be increased from default when balancing a large number of unbalanced nodes. (e.g the setting used in the comment in HDFS-8188 herehttps://issues.apache.org/jira/browse/HDFS-8818?focusedCommentId=15997429&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15997429)
... View more
11-08-2018
06:28 PM
Hi @Arpit Agarwal, In HADOOP-10597 the earliest version mentioned with the fix is 2.7.4. Is the HDP version recommended in the post correct? I'm running HDP 2.6.1 with Hadoop 2.7.3 and I would like to confirm if this parameter can be enabled or not. Thanks!
... View more
07-07-2016
04:13 AM
15 Kudos
Introduction
This article continues where part 1 left off. It describes a few more configuration settings that can be enabled in CDP, HDP, CDH, or Apache Hadoop clusters to help the NameNode scale better.
Audience
This article is for Hadoop administrators who are familiar with HDFS and its components. If you are using Ambari or Cloudera Manager you should know how to manage services and configurations. It is assumed that you have read part 1 of the article.
RPC Handler Count
The Hadoop RPC server consists of a single RPC queue per port and multiple handler (worker) threads that dequeue and process requests. If the number of handlers is insufficient, then the RPC queue starts building up and eventually overflows. You may start seeing task failures and eventually job failures and unhappy users.
It is recommended that the RPC handler count be set to 20 * log2(Cluster Size) with an upper limit of 200.
e.g. for a 250 node cluster you should initialize this to 20 * log2(250) = 160. The RPC handler count can be configured with the following setting in hdfs-site.xml.
<property>
<name>dfs.namenode.handler.count</name>
<value>160</value>
</property>
This heuristic is from the excellent Hadoop Operations book. If you are using Ambari to manage your cluster this setting can be changed via a slider in the Ambari Server Web UI. If you're using Cloudera Manager you can search for property name " dfs.namenode.service.handler.count" under the HDFS configuration page and adjust the value.
Service RPC Handler Count
Pre-requisite: If you have not enabled the Service RPC port already, please do so first as described here.
There is no precise calculation for the Service RPC handler count however the default value of 10 is too low for most production clusters. We have often seen this initialized to 50% of the dfs.namenode.handler.count in busy clusters and this value works well in practice.
e.g. for the same 250 node cluster you would initialize the service RPC handler count with the following setting in hdfs-site.xml.
<property>
<name>dfs.namenode.service.handler.count</name>
<value>80</value>
</property>
DataNode Lifeline Protocol
The Lifeline protocol is a feature recently added by the Apache Hadoop Community (see Apache HDFS Jira HDFS-9239). It introduces a new lightweight RPC message that is used by the DataNodes to report their health to the NameNode. It was developed in response to problems seen in some overloaded clusters where the NameNode was too busy to process heartbeats and spuriously marked DataNodes as dead.
For a non-HA cluster, the feature can be enabled with the following setting in hdfs-site.xml (replace mynamenode.example.com with the hostname or IP address of your NameNode). The port number can be different too.
<property>
<name>dfs.namenode.lifeline.rpc-address</name>
<value>mynamenode.example.com:8050</value>
</property>
For an HA cluster, the lifeline RPC port can be enabled with settings like the following, replacing mycluster, nn1 and nn2 appropriately.
<property>
<name>dfs.namenode.lifeline.rpc-address.mycluster.nn1</name>
<value>mynamenode1.example.com:8050</value>
</property>
<property>
<name>dfs.namenode.lifeline.rpc-address.mycluster.nn2</name>
<value>mynamenode2.example.com:8050</value>
</property>
Additional lifeline protocol settings are documented in the HDFS-9239 release note but these can be left at their default values for most clusters.
Note: Changing the lifeline protocol settings requires a restart of the NameNodes, DataNodes and ZooKeeper Failover Controllers to take full effect. If you have NameNode HA setup, you can restart the NameNodes one at a time followed by a rolling restart of the remaining components to avoid cluster downtime.
Conclusion
That is it for Part 2. In Part 3 of this article, we will explore how to enable two new HDFS features to help your NameNode scale better.
... View more
- Find more articles tagged with:
- administration
- Hadoop Core
- HDFS
- How-ToTutorial
- namenode
- scalability
Labels:
10-04-2018
09:53 AM
@Elias Abacioglu You can refer below guidance for configuring service port. https://community.hortonworks.com/articles/223817/how-do-you-enable-namenode-service-rpc-port-withou.html
... View more
06-20-2016
09:55 AM
Option 1: reformat: you will need not only to "copyFromLocal" but also recreate the file system. See for example this for details. Option 2: Exit safe mode and find out where you are. I'd recommend this one. You can also find out what caused the trouble, maybe all corrupted blocks are on a bad disk or something like that. You can share the list of files you are uncertain whether to restore them or not.
... View more
06-13-2016
04:22 PM
1 Kudo
Hi @Geetha Anne, here's the official documentation from the Ambari team on how to setup a local cluster using vagrant VMs. This is more likely to be up to date. https://cwiki.apache.org/confluence/display/AMBARI/Quick+Start+Guide
... View more
11-05-2017
08:52 AM
@Kuldeep Kulkarni, Hi, I have tried the same. Even after installing the kerberos client Manuallly, I get the same error. not sure, why the Test Kerberos client fails, I need to skip that and go to second page. * All the hosts got the Kerberos client installed. Kerberos Clients 3 Kerberos Clients Installed 1.jpg
... View more
10-28-2015
05:22 PM
4 Kudos
Configuring a separate service RPC port can improve the responsiveness of the NameNode by allowing DataNode and client requests to be processed via separate RPC queues. Adding a service RPC port to an HA cluster with automatic failover via ZKFCs requires a workaround step. The steps to configure a service RPC port are as follows: Add the following settings to hdfs-site.xml. <property>
<name>dfs.namenode.servicerpc-address.mycluster.nn1</name>
<value>nn1.example.com:8040</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.mycluster.nn2</name>
<value>nn2.example.com:8040</value>
</property> 2. Restart NameNodes. 3. Stop the ZKFC processes on both NameNodes 4. Run the following command to reset the ZKFC state in ZooKeeper hdfs zkfc -formatZK Without this step you will see the following exception after ZKFC restart. java.lang.RuntimeException: Mismatched address stored in ZK for NameNode 5. Restart the ZKFCs.
... View more
- Find more articles tagged with:
- Cloud & Operations
Labels: