Member since
10-08-2015
3
Posts
5
Kudos Received
0
Solutions
04-14-2016
12:01 PM
5 Kudos
Hi, I am new to Hadoop ecosystem. If my questions sound trivial, please forgive my ignorance. 1. I read that HDFS has a default replication factor 3, meaning whenever a file is written, each block of the file is stored 3 times. Writing same block 3 times needs more I/O compared to writing once. How does Hadoop address this ? won't it be a problem when writing large datasets ? 2. As HDFS is a Virtual Filesystem, the data it stores will ultimately stored on Underlying Operating system (Most cases, Linux). Assuming Linux has Ext3 File system (whose block size is 4KB), How does having 64MB/128 MB Block size in HDFS help ? Does the 64 MB Block in HDFS will be split into x * 4 KB Blocks of underlying Opertating system ? Thanks, RK.
... View more
Labels:
- Labels:
-
Apache Hadoop