Member since
07-30-2019
111
Posts
181
Kudos Received
35
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1854 | 02-07-2018 07:12 PM | |
1301 | 10-27-2017 06:16 PM | |
1778 | 10-13-2017 10:30 PM | |
3355 | 10-12-2017 10:09 PM | |
698 | 06-29-2017 10:19 PM |
12-01-2020
02:52 PM
The DataNodes should run the same software version as the NameNode.
... View more
08-01-2019
10:28 AM
I'm assuming you mean just to store the metadata of the changed snapshot and which isn't significant given the actual size of data held(in reference to my example above) Correct. However the metadata is tracked in NameNode memory which is a precious resource. The overhead can be significant in a large cluster with many files and millions of deltas.
... View more
08-01-2019
09:36 AM
The snapshot will not occupy any storage space on disk or NameNode heap immediately after it is created. However any subsequent changes inside the snapshottable directory will need to be tracked as deltas and that can result in both higher disk space and NameNode heap usage. E.g. if a file is deleted after taking a snapshot, the blocks cannot be reclaimed because the file is still accessible through the snapshot path. The hadoop fs -du shell command supports a -x option that allows calculating directory space usage excluding snapshots. The delta between the output with and without the -x option will tell you how much disk space is being consumed by the snapshot.
... View more
02-27-2018
10:20 PM
2 Kudos
Building Apache Tez with Apache Hadoop 2.8.0 or later fails due to client/server jar separation in Hadoop [1]. The build fails with the following error: [ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /src/tez/tez-api/src/test/java/org/apache/tez/client/TestTezClientUtils.java:[48,30] cannot find symbol
symbol: class DistributedFileSystem
location: package org.apache.hadoop.hdfs
[ERROR] /src/tez/tez-api/src/test/java/org/apache/tez/client/TestTezClientUtils.java:[680,50] cannot find symbol
symbol: class DistributedFileSystem
location: class org.apache.tez.client.TestTezClientUtils
[ERROR] /src/tez/ecosystem/tez/tez-api/src/test/java/org/apache/tez/common/TestTezCommonUtils.java:[62,42] cannot access org.apache.hadoop.hdfs.DistributedFileSystem To get Tez to compile successfully, you will need to use the new hadoop28 profile introduced by TEZ-3690 [2]. E.g. here is how you compile Tez with Apache Hadoop 3.0.0: mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true -Phadoop28 -Dhadoop.version=3.0.0 References: 1. HDFS-6200: Create a separate jar for hdfs-client 2. TEZ-3690: Tez on hadoop 3 build failed due to hdfs client/server jar separation.
... View more
- Find more articles tagged with:
- Hadoop Core
- hadoop-ecosystem
- Issue Resolution
- issue-resolution
- tez
Labels:
02-07-2018
07:12 PM
1 Kudo
There is no such thing as a "passive" NameNode. Are you asking about the HA or non-HA configuration? In an HA configuration, there is Active NameNode that serves user requests. Standby NameNode that generates periodic checkpoints. It can also take over the role of the Active if the previously active NameNode dies or becomes unresponsive. In a non-HA configuration Primary NameNode that serves user requests. Secondary NameNode that generates periodic checkpoints. A secondary NameNode can never become the primary. The terminology is unfortunately confusing.
... View more
10-27-2017
06:16 PM
Try clearing up some snapshots. You probably have a ton of deleted files retained for snapshots.
... View more
10-27-2017
06:07 PM
Did you enable security using the Ambari Kerberos wizard? That usually takes care of these settings for you.
... View more
10-27-2017
04:58 PM
A few things to check for:
Are you starting the DataNode process as root? Have you set HADOOP_SECURE_DN_USER and JSVC_HOME? Since you are using a privileged port number (<1024), ensure you have not set dfs.data.transfer.protection. The Apache Hadoop documentation for Secure DN setup is good. https://hadoop.apache.org/docs/r2.7.4/hadoop-project-dist/hadoop-common/SecureMode.html#Secure_DataNode
... View more
10-26-2017
07:12 PM
It is likely the process has not hit an allocation failure yet so GC has not kicked in. This is perfectly normal. If you want the heap usage to be lower then you can reduce the heap allocation. Alternatively you can trigger GC quicker by adding something like -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly to your heap options. However it's probably best to just follow our suggested heap configuration and let the Java runtime do the rest. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_command-line-installation/content/configuring-namenode-heap-size.html
... View more
10-26-2017
07:04 PM
What is your HDP version?
... View more
10-16-2017
08:33 PM
Hi @Sedat Kestepe, HDD reliability is measured as AFR (annualized failure rate) - the probability a hard disk will fail in a given year. AFR varies with manufacturer, model number and operating conditions. Here is one publicly available report about disk AFRs: https://www.backblaze.com/blog/hard-drive-failure-rates-q1-2017/ There is one obvious bad batch in that report with 33% AFR. For the rest it varies from 0 - 3%. Your hardware vendor should be able to provide you with the expected AFR. If your observed failure rate is higher than expected you may have a bad batch of hardware and should check with your vendor. Burn-in testing can help weed out bad hardware early on.
... View more
10-13-2017
10:30 PM
@Dr. Jason Breitweg, it will not be deleted automatically. There may be block files under that directory that you need. If the cluster has any important data - I'd recommend running 'hdfs fsck' to ensure there are no missing/corrupt blocks before you delete /var/hadoop/hdfs/data/current/BP-*. Even then I'd first move the directory to a different location, restart DataNodes and rerun fsck to ensure you don't cause data loss.
... View more
10-12-2017
10:09 PM
1 Kudo
@Bharath N, you don't need to purge edit log files. HDFS deletes them automatically when they are no longer needed. I recommend not touching any edit log files since accidentally deleting the wrong file could lead to data loss. For your NameNode start problem, I can't say anything without more data. You want to engage Hortonworks support for this problem, if you have a support contract. Else you can post the error message/exception here and we may be able to point you in the right direction.
... View more
06-30-2017
12:05 AM
Hi @steve coyle, have you upgraded your Linux kernel recently? If so it is likely this issue: https://issues.apache.org/jira/browse/HDFS-12029 You can either rollback the kernel version, or upgrade to a newer kernel that fixes that this regression (assuming your vendor has one), or apply the workaround mentioned in the description of that Jira.
... View more
06-30-2017
12:02 AM
It will be helpful if you attach the complete hdfs-site.xml from both the clusters (you can anonymize hostnames and IP addresses).
... View more
06-29-2017
10:19 PM
1 Kudo
Hi @Rohit Masur, if you want to setup a vagrant box pre-installed with Apache Hadoop 3.0 that should be fine. Although Apache Hadoop 3.0 is a fast-moving target right now (still in Alpha) so it may be better to have good documentation. If the existing installation guide is wrong, please do call that out on the hadoop-user mailing list. Even better if you can file an Apache Hadoop Jira and post a patch to fix the documentation. Feel free to tag me on the Apache Jira if you need any help submitting the patch.
... View more
06-22-2017
07:46 PM
7 Kudos
The HDFS NameNode ensures that each block is sufficiently replicated. When it detects the loss of a DataNode, it instructs remaining nodes to maintain adequate replication by creating additional block replicas.
For each lost replica, the NameNode picks a (source, destination) pair where the source is an available DataNode with another replica of the block and the destination is the target for the new replica. The re-replication work can be massively parallelized in large clusters since the replica distribution is randomized.
In this article, we estimate a lower bound for the recovery time. Simplifying Assumptions
The maximum IO bandwidth of each disk is 100MB/s (reads + writes). This is true for the vast majority of clusters that use spinning disks. The aggregate IO capacity of the cluster is limited by disk and not the network. This is not always true but helps us establish lower bounds without discussing network topologies. Block replicas are uniformly distributed across the cluster and disk usage is uniform. True if the HDFS balancer was run recently.
Theoretical Lower Bound
Let's assume the cluster has n nodes. Each each node has p disks, and the usage of each disk is c TeraBytes. The data usage of each node is thus (p ⋅ c) TB.
The amount of data data transfer needed for recovery is twice the capacity of the lost DataNode as each replica must be read once from a source disk and written once to the target disk.
Data transfer during recovery = 2 ⋅ (Node Capacity)
= (2 ⋅ p ⋅ c) TB
= (2 ⋅ p ⋅ c ⋅ 1,000,000) MB
The re-replication rate is the limited by the available aggregate IO bandwidth in the cluster: Cluster aggregate IO bandwidth = (Disk IO bandwidth) ⋅ (Number of disks)
= (100 ⋅ n ⋅ p) MB/s Thus Minimum Recovery Time = (Data transfer during recovery) / (Cluster aggregate IO bandwidth)
= (2 ⋅ p ⋅ c ⋅ 1,000,000) / (100 ⋅ n ⋅ p)
= (20,000 ⋅ c/n) seconds.
where: c = Mean usage of each disk in TB.
n = Number of DataNodes in the cluster. This is the absolute best case with no other load, no network bandwidth limits, and a perfectly efficient scheduler.
E.g. In a 100 node cluster where each disk has 4TB of data, recovery from the loss of a DataNode must take at least (20,000 ⋅ 4) / 100 = 800 seconds or approximately 13 minutes.
Clearly, the cluster size bounds the recovery time. Disk capacities being equal, a 1000 node cluster can recover 10x faster than a 100 node cluster. A More Practical Lower Bound
The theoretical lower bound assumes that block re-replications can be instantaneously scheduled across the cluster. It also assumes that all cluster IO capacity is available for re-replication whereas in practice application reads and writes also consume IO capacity. The NameNode schedules 2 outbound replication streams per DataNode, per heartbeat interval to throttle re-replication traffic. This throttle allows DataNodes to remain responsive to applications. The throttle can be adjusted via the configuration setting dfs.namenode.replication.max-streams. Let's call this m and the heartbeat interval h.
Also let's assume the mean block size in the cluster is b MB. Then: Re-replication Rate = Blocks Replicated cluster-wide per heartbeat interval
= (n ⋅ m/h) Blocks/s
The total number of blocks to be re-replicated is the capacity of the lost node divided by the mean block size. Number of Blocks Lost = (p ⋅ c) TB / b MB
= (p ⋅ c ⋅ 1,000,000/b).
Thus:
Recovery Time = (Number of Blocks Lost) / (Re-replication Rate)
= (p ⋅ c ⋅ 1,000,000) / (b ⋅ n ⋅ m/h)
= (p ⋅ c ⋅ h ⋅ 1,000,000) / (b ⋅ n ⋅ m) seconds.
where: p = Number of disks per node.
c = Mean usage of each disk in TB.
h = Heartbeat interval (default = 3 seconds).
b = Mean block size in MB.
n = Number of DataNodes in the cluster.
m = dfs.namenode.replication.max-streams (default = 2)
Simplifying by plugging in the defaults for h and m, we get
Minimum Recovery Time (seconds) = (p ⋅ c ⋅ 1,500,000) / (b ⋅ n)
E.g. in the same cluster, assuming the mean block size is 128MB and each node has 8 disks, the practical lower bound on recovery time will be 3,750 seconds or ~1 hour. Reducing the Recovery Time
The recovery time can be reduced by:
Increasing dfs.namenode.replication.max-streams. However, setting this value too high can affect cluster performance. Note that increasing this value beyond 4 must be evaluated carefully and also requires changing the safeguard upper limit via dfs.namenode.replication.max-streams-hard-limit.
Using more nodes with smaller disks. Total cluster capacity remaining the same, a cluster with more nodes and smaller disks will recover faster.
Avoiding predominantly small blocks.
... View more
- Find more articles tagged with:
- datanode
- FAQ
- Hadoop Core
- HDFS
- recovery
- replication
Labels:
06-21-2017
09:58 PM
Thanks for the heads up @Namit Maheshwari. I don't have a better solution in mind than what @Mark Davis already described.
... View more
06-12-2017
05:12 PM
1 Kudo
@Laurent Edel this answer is incorrect. Please consider editing it to mention decommissioning. Else someone may assume it's OK to just remove nodes if they have rack awareness.
... View more
06-12-2017
05:06 PM
2 Kudos
Don't just remove the DataNodes. Even with rack awareness, removing >2 nodes from different racks will lead to data loss. Instead, you should decommission them first as described here: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_administration/content/ref-a179736c-eb7c-4dda-b3b4-6f3a778bd8c8.1.html You may know this already, but I want to make it clear for others who read this discussion in the future.
... View more
05-22-2017
08:39 PM
Sorry I missed the notification of your reply. That is also a good question. I have not yet come across a customer setup where HDFS audit logging is disabled and Ranger audit logs is on. I'd recommend tagging someone from Ranger to make sure.
... View more
05-08-2017
02:56 PM
@Ward Bekker we don't recommend disabling HDFS audit logging. It's hard to debug many HDFS issues without the audit log. Just curious, why would you like to disable it?
... View more
04-26-2017
05:25 PM
Hi @suresh krish, the Kerberos Principals section from the Apache Hadoop docs should answer some of your questions. I found the first few chapters of the book Hadoop Security to be a readable introduction to this complex topic.
... View more
04-25-2017
09:24 PM
1 Kudo
Hi @Mark Heydenrych, it is likely that your DataNodes are not configured with sufficient Java heap. Even though there is free RAM on the machine, the Java runtime will not use memory beyond it's configured maximum heap size which is specified via the -Xmx command-line option. You may be seeing on this on only a few DataNodes because they wound up with more blocks. This setting can be changed via the HADOOP_DATANODE_OPTS environment variable in Advanced hadoop-env.sh via Ambari. I recommend starting by doubling the heap allocation for the DataNode and also adding the following options if not present already: -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:ConcGCThreads=8 -XX:+UseConcMarkSweepGC -XX:PermSize=128m -XX:MaxPermSize=256m Also the new generation heap allocation (configured via -XX:MaxNewSize=) should be set to 1/8th of the total process heap size. Another recommendation is to run the HDFS balancer to redistribute block replicas across the cluster. https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer
... View more
04-25-2017
09:16 PM
1 Kudo
Nobody. Periodic checkpointing is suspended in an HA setup when the Standby NameNode is down.
... View more
04-24-2017
08:13 PM
1 Kudo
Hi @Michael Häusler, this may be caused by HDFS-9958. I see that HDFS-9958 is not fixed in HDP 2.4.2 but it was fixed in HDP 2.4.3. If you can see this consistently I'd recommend upgrading to check whether that fixes the problem. If you have a support contract we can provide you with a hotfix release.
... View more
04-23-2017
08:43 PM
Check your rack setting for the DataNode. If you don't see the problem you can post the output of the following command and someone may be able to point out the error. hdfs dfsadmin -report
... View more
04-20-2017
05:17 PM
Hi @Raaj M, your fs.defaultFS should point to a nameservice. Since your nameservice is `ha-cluster`, fs.defaulFS should be: <property>
<name>fs.defaultFS</name>
<value>hdfs://ha-cluster</value>
</property> After fixing this, try stopping all services and reformatting your ZK node as described here: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Initializing_HA_state_in_ZooKeeper
... View more
04-18-2017
02:46 PM
Hi @Sedat Kestepe, take a look at rack awareness. https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/RackAwareness.html Here's how you can configure racks using Ambari https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_Ambari_Users_Guide/content/ch03s11.html HDFS will avoid placing all block replicas in the same rack to avoid data loss in case of a rack failure. You may be able to use this to achieve what you want.
... View more
04-12-2017
03:12 PM
@Sami Ahmad if you have a support contract I recommend you reach out to our support team. We have directory-level data protection that can be optionally enabled. However a determined privileged user can wipe user data.
... View more