Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar

Heterogeneous Storage in HDFS

Hadoop version 2.6.0 introduced a new feature heterogeneous storage. Heterogeneous storage can be different according to each play their respective advantages of the storage medium to read and write characteristics. This is very suitable for cold storage of data. Data for the cold means storage with large capacity and where high read and write performance is not required, such as the most common disk for thermal data, the SSD can be used to store this way. On the other hand when we required efficient read performance, even in rate appear able to do ten times or a hundred times the ordinary disk read and write speed, or even data directly stored memory, lazy loaded hdfs.

HDFS heterogeneous storage characteristics are when we do not need to build two separate clusters to store cold thermal class II data within a cluster can be done, so this feature is still very large practical significance. Here I introduce heterogeneous storage type, and if the flexible configuration of heterogeneous storage!

  • Ultra cold data storage, hard disk storage is very inexpensive - bank notes video system scenario
  • IO read and write large-scale deployment scenarios, providing order - the default storage type
  • Type SSD storage - Efficient data query visualization, external data sharing, improve performance.
  • RAM_DISK - For extreme performance.
  • Hybrid disc - an ssd or a hdd + sata or sas

HDFS Storage Type

ARCHIVE - Archival storage is for very dense storage and is useful for rarely accessed data. This storage type is typically cheaper per TB than normal hard disks.

DISK - Hard disk drives are relatively inexpensive and provide sequential I/O performance. This is the default storage type.

SSD - Solid state drives are useful for storing hot data and I/O-intensive applications.

RAM_DISK - This special in-memory storage type is used to accelerate low-durability, single-replica writes.

HDFS Storage Policies has six preconfigured storage policies

Hot - All replicas are stored on DISK.

Cold - All replicas are stored ARCHIVE.

Warm - One replica is stored on DISK and the others are stored on ARCHIVE.

All_SSD - All replicas are stored on SSD.

One_SSD - One replica is stored on SSD and the others are stored on DISK.

Lazy_Persist - The replica is written to RAM_DISK and then lazily persisted to DISK.

Next article i'll show practical usage with HDFS storage settings and a Storage Policy for HDFS Using Ambari,

to be continue..

5,190 Views
Comments
avatar
Explorer

@mkumar13 

how can we move data older than 2 years to ARCHIVE?