Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to define HDFS storage tiers and storage polices in CDH 5.4.x

Solved Go to solution
Highlighted

How to define HDFS storage tiers and storage polices in CDH 5.4.x

New Contributor

In order to use Hadoop 2.6 storage policies you must specify the type for each mount point in dfs.datanode.data.dir the type (DISK, ARCHIVE, RAM_DISK, SSD). If editing the hdfs-site.xml file directly I would do -

 

<property>
     <name>dfs.datanode.data.dir</name>
     <value>[ARCHIVE]file:///mnt/archive/dfs/dn,[SSD]file:///mnt/flash/dfs/dn,[DISK]file:///mnt/disk/dfs/dn</value>
</property>

However, if I try and use this format in the CM GUI I get the following error - 

  • DataNode Data Directory: Path [ARCHIVE]:///mnt/archive/dfs/dn does not conform to the pattern "(/[-+=_.a-zA-Z0-9]+)+(/)*"
  • DataNode Data Directory: Path [DISK]:file///mnt/disk/dfs/dn does not conform to the pattern "(/[-+=_.a-zA-Z0-9]+)+(/)*"
  • DataNode Data Directory: Path [SSD]:file///mnt/flash/dfs/dn does not conform to the pattern "(/[-+=_.a-zA-Z0-9]+)+(/)*"

 

Does anyone know what is the correct format to specify storage tiers in the GUI or how to manually bypass the GUI and configure this.

 

Thank you 

 

Daniel

 

CM_Tiers_Error.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How to define HDFS storage tiers and storage polices in CDH 5.4.x

Master Guru
CM currently lacks support to define storage types. If you'd like to use this feature at the moment, place your XML override in the "DataNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml" instead, which accepts <property/> tags.

View solution in original post

4 REPLIES 4
Highlighted

Re: How to define HDFS storage tiers and storage polices in CDH 5.4.x

Master Guru
CM currently lacks support to define storage types. If you'd like to use this feature at the moment, place your XML override in the "DataNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml" instead, which accepts <property/> tags.

View solution in original post

Highlighted

Re: How to define HDFS storage tiers and storage polices in CDH 5.4.x

New Contributor

Thanks - that worked.

 

I'm assuming that I should leave the mounts in the dfs.datanode.data.dir section, so that CM knows to monitor the mounts.

Highlighted

Re: How to define HDFS storage tiers and storage polices in CDH 5.4.x

Master Guru
Yes, that'd be a good idea.

Glad to hear it worked! Feel free to also mark the discussion as solved so others looking at similar issues may find this thread faster.
Highlighted

Re: How to define HDFS storage tiers and storage polices in CDH 5.4.x

Explorer

Some more questions  based on this thread

 

Once storage configuration is defined and SSDs/ Disks are identified by HDFS,

  1. does all drives (SSDs+ DIsks) are used and single virtual storage ?
    1. if yes does it mean while running jobs/queries some data blocks would be fetched from Disks while others from SSDs?
  2. or two different virtual storage hot and cold??  
    1. If Yes, while copying/generating data in HDFS, will there be 3 copies of data across disks+storage or 3 copies in Disks and 3 copies in SSDs ; total 6 copies?
    2. how do I force data to be used from SSDs only or DISKs only; while submitting any Jobs/queries using various tools(hive, Impala, spark etc) 

 

Don't have an account?
Coming from Hortonworks? Activate your account here