Member since
02-05-2018
5
Posts
0
Kudos Received
0
Solutions
02-06-2018
02:24 AM
When Client try to read(SELECT) on which all ready open to Write(INSERT/DELETE) 1)Client request for READ a block from HDFS file system. 2) That given block is all ready open for write and it wait till WRITE Operation complete, because its Start/End block ID will change during Write, hence Client read wait till complete. 3) Client wait till "dfs.client.failover.max.attempts" in HDFS-SITE.xml , Ex:- 10 attempt , it try for 10 attempt to read the operation , mean time if Write HDFS complete , client will read & complete. 4) if Client not able to read within max "dfs.client.failover.max.attempts" attempt Client read request will fail.
... View more
02-05-2018
06:52 PM
@Malay Sharma Really depends on your workload. The best case is if you can get good block sizes, that if you have really large files, you should set your block size to 256MB( this reduces the number of blocks in the file system, since you are storing more data per block), or use the default 128 MB. If you have a running cluster, using smart sense, you can get a view of the file size distribution of your current cluster. That will give you an idea if you need to tune the block size for performance. From experience, I can tell you that the performance of your cluster is not going to be dependent on block size unless you have a large number of files in the system. Also, unlike a physical file system, setting the block size to 128MB does not mean that each block write will use up 128MB. HDFS will only use the number of bytes actually written, so there is no waste of space because of the block size.
... View more
02-05-2018
06:41 PM
2 Kudos
@Malay Sharma HDFS is very good at caching all file names and block addresses in memory. This is done at the Namenode level. This makes HDFS incredibly fast. So if you make modifications to the file system or read a file location, all of this can be served with no disk I/Os. This design choice of keeping all metadata in memory at all times has certain trade-offs. One of them is that we need to spend a couple of bytes (think 100s of bytes) per file and blocks. This leads to an issue in HDFS -- when you have a file system with 500 million to 700 million -- the amount of RAM that needs to be reserved by the Namenode becomes large. Typically, in sizes of 256 GB or more. At this size, the JVM is hard at work too; since it has to do things like garbage collection. There is also another dimension to this when you have 700 million files, it is quite possible that your cluster is serving 30-40K or more requests per second. This also creates lots of memory churn. So a large number of files, combined with lots of file system requests makes Namenode a bottleneck in HDFS or in other words, the metadata that we need to keep in memory creates a bottleneck in HDFS. There are several solutions / work in progress to address this problem -- HDFS federation -- That is being shipped as part of HDP 3.0, allows many names nodes to work against a set of Datanodes. In HDFS-7240, where we are trying to separate the block space from the namespace, that allows us to immediately double or quadruple the effective size of the cluster. Here is good document that tracks various issues and different approaches to scaling the namenode -- Uber scaling namenode There is another approach where we try to send the read workloads to the secondary namenode, freeing up the active namenode and thus scaling it better. That work is tracked in Consistent Reads from Standby Node Please let me know if you have any other questions. Thanks Anu
... View more