Support Questions

aishwaryamdixit · ‎08-07-2017

nicholas_ruggie · ‎08-07-2017

Data is stored in 8K blocks on disk, these make up 128K stripes that are parity protected and striped across nodes and disks. Files smaller than 128K are mirrored instead. This provides a good balance between file size and storage efficiency, since Isilon storage is parity based it gives a better overall storage utilization. HDFS blocks that are 128MB for example are triple mirrored when stored (realize that this is configurable). As an example for a 5 node Isilon cluster (very common) and n+1 protection, a file will be broken up into 4 stripes and one parity stripe (aka 4+1) to be distributed across the cluster. this is an storage overhead of 1/4th or 20% so the effective ondisk storage is 120% for Isilon and 300% for HDFS.

FWIW, Isilon uses the HDFS protocol and as such, Isilon uses the HDFS Blocksize parameter to send files across the network, and this value can be tuned to specific workflows. This value should correspond to dfs.blocksize parameter.

View solution in original post

nicholas_ruggie · ‎08-07-2017

Data is stored in 8K blocks on disk, these make up 128K stripes that are parity protected and striped across nodes and disks. Files smaller than 128K are mirrored instead. This provides a good balance between file size and storage efficiency, since Isilon storage is parity based it gives a better overall storage utilization. HDFS blocks that are 128MB for example are triple mirrored when stored (realize that this is configurable). As an example for a 5 node Isilon cluster (very common) and n+1 protection, a file will be broken up into 4 stripes and one parity stripe (aka 4+1) to be distributed across the cluster. this is an storage overhead of 1/4th or 20% so the effective ondisk storage is 120% for Isilon and 300% for HDFS.

FWIW, Isilon uses the HDFS protocol and as such, Isilon uses the HDFS Blocksize parameter to send files across the network, and this value can be tuned to specific workflows. This value should correspond to dfs.blocksize parameter.

Cloudera Community

Support Questions

What is the block size while storing the files in Isilon? How is it better than HDFS?