Reply
Contributor
Posts: 36
Registered: ‎01-11-2016

CDH with AWS EBS ST1/SC1 volumes (Cloudera Enterprise Reference Architecture for AWS Deployments)

Hi,

 

This post and topic is directed to Cloudera Technical Staff who may be logged into the Cloudera Community and who may able to advise....and either correct me, or hopefully take steps to correct what I've read.

 

I've just read the latest "Cloudera Enterprise Reference Architecture for AWS Deployments" PDF (from July 2016), and on page 16-17 of this PDF, it mentions the following undeer the "EBS Volume Tuning" heading:

 

---------------------------

EBS Volume Tuning:

 

Per EBS performance guidance, increase readahead for highthroughput, readheavy workloads on st1 and sc1:

 

     sudo blockdev --setra 2048 /dev/<device>

 

To verify the read-ahead:

 

     sudo blockdev --report /dev/<device>

 

---------------------------------

 

 

The "blockdev --setra" command does not, to my knowledge, persist the Linux Read Ahead value for an EBS volume any longer than the current system state (ie:  it does not set it permanently so it persists accross re-boots of the AWS EC2 instance).

 

If I setup a new CDH cluster with a bunch of AWS EC2 "m4.4xlarge" instances (EBS optimized, HVM based CentOS AMI) running as a Worker Nodes (with DataNode and NodeManager services) and the following HDFS parameters set to 4 x 2000GB EBS ST1/SC1 volumes:

 

DataNode Data Directory for each AWS EC2 Linux instance running as a Worker Node (dfs.data.dir, dfs.datanode.data.dir)

 

/disk/b/appdata/dfs/dn         # 2GB AWS EBS ST1 volume, mounted on Linux block device /dev/xvdb

/disk/c/appdata/dfs/dn         2GB AWS EBS ST1 volume, mounted on Linux block device /dev/xvdc

/disk/d/appdata/dfs/dn         2GB AWS EBS ST1 volume, mounted on Linux block device /dev/xvdd

/disk/e/appdata/dfs/dn         2GB AWS EBS ST1 volume, mounted on Linux block device /dev/xvde

 

 

 

Then when I re-boot each/any of the Worker Nodes, the previous Read Ahead settings configured using the commands shown above are lost and Read Ahead for each Worker Node volume returns to the default value (256 x 512 bytes on CentOS 6.5 systems).

 

Running the "blockdev --report" command on each of these EBS EC2 Worker Nodes confirms this.

 

In order to persist the Read Ahead settings, I need to add the following to the /etc/rc.local file on my CentOS Worker Nodes:

 

# --------------------------------------------------------------------------------------

# Added by Damion to persist increasing the CentOS Read Ahead value accross re-boots.

# This is recommended when running Cloudera Hadoop on AWS EBS volumes of type ST1/SC1.

#

# This should ONLY be done on file systems like HDFS that perform very large sequential

# reads (HDFS block size of 64MB/128MB).

#

# DO NOT do this on AWS EBS volumes that are not part of HDFS file systems.

#

# Doing this on file systems that dont undergo very large sequential read will mean you

# could see a massive decrease in IOPS (because some linux kernels disable readahead

# when they detect non-sequential access patterns.

#

# In the commands below:

# /dev/xvd[b,c,d,e] is mounted on /disk/[b,c,d,e] and will be used as a HDFS f/s

# --------------------------------------------------------------------------------------

blockdev --setra 2048 /dev/xvdb

blockdev --setra 2048 /dev/xvdc

blockdev --setra 2048 /dev/xvdd

blockdev --setra 2048 /dev/xvde

 

 

 

My point:  Can you confirm this is the case, and if so, is it worth updating the "Cloudera Enterprise Reference Architecture for AWS Deployments" PDF document with information relating to this permanent requirement ?

 

 

Cheers,

 

Damion

 

Highlighted
Cloudera Employee
Posts: 6
Registered: ‎07-08-2013

Re: CDH with AWS EBS ST1/SC1 volumes (Cloudera Enterprise Reference Architecture for AWS Deployments

Hi Damion,

 

Absolutely correct, blockdev does not persist on reboot.

 

Tuning the block device settings in rc.local (or other post-boot script) is the recommended way, ideally with the amount of documentation that you've included in your example!

 

The next revision of the Reference Architecture for AWS will be updated accordingly. Thanks for the suggestion!

 

- Alex

 

 

Announcements