Support Questions

goodes · ‎06-24-2015

In order to use Hadoop 2.6 storage policies you must specify the type for each mount point in dfs.datanode.data.dir the type (DISK, ARCHIVE, RAM_DISK, SSD). If editing the hdfs-site.xml file directly I would do -

<property>
     <name>dfs.datanode.data.dir</name>
     <value>[ARCHIVE]file:///mnt/archive/dfs/dn,[SSD]file:///mnt/flash/dfs/dn,[DISK]file:///mnt/disk/dfs/dn</value>
</property>

However, if I try and use this format in the CM GUI I get the following error -

DataNode Data Directory: Path [ARCHIVE]:///mnt/archive/dfs/dn does not conform to the pattern "(/[-+=_.a-zA-Z0-9]+)+(/)*"
DataNode Data Directory: Path [DISK]:file///mnt/disk/dfs/dn does not conform to the pattern "(/[-+=_.a-zA-Z0-9]+)+(/)*"
DataNode Data Directory: Path [SSD]:file///mnt/flash/dfs/dn does not conform to the pattern "(/[-+=_.a-zA-Z0-9]+)+(/)*"

Does anyone know what is the correct format to specify storage tiers in the GUI or how to manually bypass the GUI and configure this.

Thank you

Daniel

Harsh J · ‎06-24-2015

CM currently lacks support to define storage types. If you'd like to use this feature at the moment, place your XML override in the "DataNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml" instead, which accepts <property/> tags.

View solution in original post

Harsh J · ‎06-24-2015

CM currently lacks support to define storage types. If you'd like to use this feature at the moment, place your XML override in the "DataNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml" instead, which accepts <property/> tags.

goodes · ‎06-25-2015

Thanks - that worked.

I'm assuming that I should leave the mounts in the dfs.datanode.data.dir section, so that CM knows to monitor the mounts.

Harsh J · ‎06-25-2015

Yes, that'd be a good idea.

Glad to hear it worked! Feel free to also mark the discussion as solved so others looking at similar issues may find this thread faster.

ANKITINTEL · ‎04-18-2016

Some more questions based on this thread

Once storage configuration is defined and SSDs/ Disks are identified by HDFS,

does all drives (SSDs+ DIsks) are used and single virtual storage ?
1. if yes does it mean while running jobs/queries some data blocks would be fetched from Disks while others from SSDs?
or two different virtual storage hot and cold??
1. If Yes, while copying/generating data in HDFS, will there be 3 copies of data across disks+storage or 3 copies in Disks and 3 copies in SSDs ; total 6 copies?
2. how do I force data to be used from SSDs only or DISKs only; while submitting any Jobs/queries using various tools(hive, Impala, spark etc)

Cloudera Community

Support Questions

How to define HDFS storage tiers and storage polices in CDH 5.4.x