01-27-2015 10:50 AM
In my experiments with CDH5, I have always set the staging area parameter "hadoop.tmp.dir" to "/tmp" on HDFS.
My main question is : Is this staging area located on some DataNode's local disk that the NameNode randomly picks?
If so, then I understand the HDFS File Write path to be : Client gets the name of the DataNode hosting the 'staging' area, writes the first block to it, then initiates the replication pipeline to mirror this block to the other DataNodes (specified by the NameNode). Is this correct?