Member since
02-05-2018
5
Posts
0
Kudos Received
0
Solutions
02-06-2018
02:24 AM
When Client try to read(SELECT) on which all ready open to Write(INSERT/DELETE) 1)Client request for READ a block from HDFS file system. 2) That given block is all ready open for write and it wait till WRITE Operation complete, because its Start/End block ID will change during Write, hence Client read wait till complete. 3) Client wait till "dfs.client.failover.max.attempts" in HDFS-SITE.xml , Ex:- 10 attempt , it try for 10 attempt to read the operation , mean time if Write HDFS complete , client will read & complete. 4) if Client not able to read within max "dfs.client.failover.max.attempts" attempt Client read request will fail.
... View more
10-16-2018
04:22 PM
Write
Operation:- * Client node interacts with namenode
* Namenode provides address of the data nodes on which the write operation
should take place
* Once client receives the data nodes address, it starts to write the
information directly on data node
* Client node does not write information on the replicas of a Block ,
it writes only to one block
* The datanodes or
slaves know how to share or copy the data from each other. Basically, the slave
nodes start to replicate among themselves At API level, The client sends a request to the distributed file
system to fetch address related details about the blocks on slaves.The write
operation uses FSdata
Outputstream to write information to a block. As soon as
the block is written to the first data node/slave, the replica begins to
maintain data consistency among all the blocks in different data nodes
Once the replication process is complete, the acknowledgment is sent from the
last node to the previous node and likewise, it reaches the first node which
sends the final acknowledgment to FS data output-stream.
In case of the crash of a particular datanode , namenode is intelligent enough
to send the address of a active datanode hosting that block for write operation
to smoothly proceed.All along , the namenode is in constant communication with
each of the datanodes or slaves.
... View more
02-05-2018
10:13 AM
@Malay Sharma, If you are using Ambari, Go to HDFS -> Configs -> Settings . Under Namenode you can give comma separated values for 'NameNode directories' (dfs.namenode.name.dir) . Similarly for DataNode you can give comma separated values for 'DataNode directories' (dfs.datanode.data.dir) Thanks, Aditya
... View more
02-05-2018
06:52 PM
@Malay Sharma Really depends on your workload. The best case is if you can get good block sizes, that if you have really large files, you should set your block size to 256MB( this reduces the number of blocks in the file system, since you are storing more data per block), or use the default 128 MB. If you have a running cluster, using smart sense, you can get a view of the file size distribution of your current cluster. That will give you an idea if you need to tune the block size for performance. From experience, I can tell you that the performance of your cluster is not going to be dependent on block size unless you have a large number of files in the system. Also, unlike a physical file system, setting the block size to 128MB does not mean that each block write will use up 128MB. HDFS will only use the number of bytes actually written, so there is no waste of space because of the block size.
... View more
02-05-2018
06:41 PM
2 Kudos
@Malay Sharma HDFS is very good at caching all file names and block addresses in memory. This is done at the Namenode level. This makes HDFS incredibly fast. So if you make modifications to the file system or read a file location, all of this can be served with no disk I/Os. This design choice of keeping all metadata in memory at all times has certain trade-offs. One of them is that we need to spend a couple of bytes (think 100s of bytes) per file and blocks. This leads to an issue in HDFS -- when you have a file system with 500 million to 700 million -- the amount of RAM that needs to be reserved by the Namenode becomes large. Typically, in sizes of 256 GB or more. At this size, the JVM is hard at work too; since it has to do things like garbage collection. There is also another dimension to this when you have 700 million files, it is quite possible that your cluster is serving 30-40K or more requests per second. This also creates lots of memory churn. So a large number of files, combined with lots of file system requests makes Namenode a bottleneck in HDFS or in other words, the metadata that we need to keep in memory creates a bottleneck in HDFS. There are several solutions / work in progress to address this problem -- HDFS federation -- That is being shipped as part of HDP 3.0, allows many names nodes to work against a set of Datanodes. In HDFS-7240, where we are trying to separate the block space from the namespace, that allows us to immediately double or quadruple the effective size of the cluster. Here is good document that tracks various issues and different approaches to scaling the namenode -- Uber scaling namenode There is another approach where we try to send the read workloads to the secondary namenode, freeing up the active namenode and thus scaling it better. That work is tracked in Consistent Reads from Standby Node Please let me know if you have any other questions. Thanks Anu
... View more