Member since
10-21-2015
59
Posts
31
Kudos Received
16
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3119 | 03-09-2018 06:33 PM | |
2795 | 02-05-2018 06:52 PM | |
14001 | 02-05-2018 06:41 PM | |
4414 | 11-30-2017 06:46 PM | |
1670 | 11-22-2017 06:20 PM |
08-31-2018
06:44 PM
Thanks for the update. Glad you were able to make it work. Thanks for the comments and sharing it with the community.
... View more
04-26-2018
06:22 PM
@Sriram Hadoop >How can I change the block size for the existing files in HDFS? I want to increase the block size. May I ask what you are trying to achieve? We might be able to make better suggestions if we know what is the problem you are trying to solve?
... View more
04-26-2018
06:19 PM
1 Kudo
@Michael Bronson 1. is it safty to run e2fsck -y /dev/sdf in order torepair a disk /dev/sdf file-system ? Datanodes need to be able to read and write to the underlying file system. So if there is an error in the file system, we have no choice but to fix it. That said, HDFS will have the same blocks on other machines. So you can put this node into maintenance mode in Ambari and fix the file system errors. There is a possibility of losing some data blocks. So if you have this error in more than one datanode, please do this one by one, with some time in between. I would run fcsk and then reboot the Datanode machine to make sure everything is okay, before starting work on the next node. 2. is it necessary to do some other steps after running - e2fsck -y /dev/sdf ? Not from the HDFS point of view, as I said, I would make sure I am doing this datanode by datanode and not in parallel.
... View more
03-09-2018
06:34 PM
@smdas Sorry forgot to tag you.
... View more
03-09-2018
06:33 PM
A quick search in the code base tells me that we have these following policies AvailableSpaceBlockPlacementPolicy BlockPlacementPolicyDefault BlockPlacementPolicyRackFaultTolerant BlockPlacementPolicyWithNodeGroup BlockPlacementPolicyWithUpgradeDomain > yet didn't find any documentation listing the available choices. You are absolutely right, we can certainly do better on documenting this, Thanks for pointing this out. I will address this in an Apache JIRA.
... View more
02-14-2018
05:07 PM
@PJ I am guessing that it could be related to "dfs.namenode.startup.delay.block.deletion" value since you mention that you restarted the cluster.
... View more
02-05-2018
06:52 PM
@Malay Sharma Really depends on your workload. The best case is if you can get good block sizes, that if you have really large files, you should set your block size to 256MB( this reduces the number of blocks in the file system, since you are storing more data per block), or use the default 128 MB. If you have a running cluster, using smart sense, you can get a view of the file size distribution of your current cluster. That will give you an idea if you need to tune the block size for performance. From experience, I can tell you that the performance of your cluster is not going to be dependent on block size unless you have a large number of files in the system. Also, unlike a physical file system, setting the block size to 128MB does not mean that each block write will use up 128MB. HDFS will only use the number of bytes actually written, so there is no waste of space because of the block size.
... View more
02-05-2018
06:41 PM
2 Kudos
@Malay Sharma HDFS is very good at caching all file names and block addresses in memory. This is done at the Namenode level. This makes HDFS incredibly fast. So if you make modifications to the file system or read a file location, all of this can be served with no disk I/Os. This design choice of keeping all metadata in memory at all times has certain trade-offs. One of them is that we need to spend a couple of bytes (think 100s of bytes) per file and blocks. This leads to an issue in HDFS -- when you have a file system with 500 million to 700 million -- the amount of RAM that needs to be reserved by the Namenode becomes large. Typically, in sizes of 256 GB or more. At this size, the JVM is hard at work too; since it has to do things like garbage collection. There is also another dimension to this when you have 700 million files, it is quite possible that your cluster is serving 30-40K or more requests per second. This also creates lots of memory churn. So a large number of files, combined with lots of file system requests makes Namenode a bottleneck in HDFS or in other words, the metadata that we need to keep in memory creates a bottleneck in HDFS. There are several solutions / work in progress to address this problem -- HDFS federation -- That is being shipped as part of HDP 3.0, allows many names nodes to work against a set of Datanodes. In HDFS-7240, where we are trying to separate the block space from the namespace, that allows us to immediately double or quadruple the effective size of the cluster. Here is good document that tracks various issues and different approaches to scaling the namenode -- Uber scaling namenode There is another approach where we try to send the read workloads to the secondary namenode, freeing up the active namenode and thus scaling it better. That work is tracked in Consistent Reads from Standby Node Please let me know if you have any other questions. Thanks Anu
... View more
12-05-2017
06:18 PM
From the error message, it looks like some of the services might not be running. Can you please make sure that zookeeper and journal nodes are indeed running before starting NN?
... View more
12-01-2017
06:23 PM
1 Kudo
@Sedat Kestepe Since you don't care about data, from an HDFS perspective it is easier to reinstall your cluster. If you insist I can lead you through the recovery steps, but if I were you I would just reinstall at this point.
... View more