Reply
Explorer
Posts: 10
Registered: ‎06-25-2015

Cloudera Express- HDFS - Using CLI to Backup HDFS to S3 Bucket in AWS

Hello,

 

I am trying to setup HDFS backup using S3 Buckets on Clouera.  I have searched the internet high and low for the syntax to do this via CLI and have had no luck!

After many hours I have figured out how to get this to work

 

If you have this defined in the properties file use:

sudo -u hdfs hdfs dfs -cp hdfs://nameservice/* s3n://@BUCKET-NAME/

 

If you do not have it defined:

sudo -u hdfs hdfs dfs -cp hdfs://nameservice/* s3n://SECRET-KEY:PRIVATE-KEY@s3://s3-us-BUCKET-NAME/

 

I hope that saves many users hours of guessing and configuration

 

Ok so my issue is

What is the syntax for s3 vs s3n?  s3n works flawlessly however it limits files to less than 5gb in size!

Since I am backing up HDFS which is >5gb in size I will need to use the s3:// 

 

 

What am I missing?  Changing the above commands from s3n://  to s3:// does gives me the following error:

 

cp: `s3://BUCKET-NAME/': No such file or directory

Explorer
Posts: 10
Registered: ‎06-25-2015

Re: Cloudera Express- HDFS - Using CLI to Backup HDFS to S3 Bucket in AWS

Also, the properties files where you save your secret / private key are

 

Cloudera Manager > HDFS Configuration >  Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml 

 

Then set the following properties:

 

<!-- Amazon S3 -->
<property>
          <name>fs.s3.awsAccessKeyId</name>
          <value>PRIVATE-KEY</value>
</property>


<property>
           <name>fs.s3.awsSecretAccessKey</name>
           <value>SECRET-KEY</value>
</property>

Explorer
Posts: 10
Registered: ‎06-25-2015

Re: Cloudera Express- HDFS - Using CLI to Backup HDFS to S3 Bucket in AWS

Anybody have any ideas on which I should be using?

 

s3?  or s3n? for HDFS? 

Explorer
Posts: 10
Registered: ‎06-25-2015

Re: Cloudera Express- HDFS - Using CLI to Backup HDFS to S3 Bucket in AWS

Hello,

 

Never got a response on where to go to automate the backups of the Cloudera - Express version.

 

Any ideas on how to do this with AWS? Any help or guidance would be greatly appreciated!!

 

Cloudera Employee
Posts: 578
Registered: ‎01-20-2014

Re: Cloudera Express- HDFS - Using CLI to Backup HDFS to S3 Bucket in AWS

S3A is now the preferred protocol from CDH 5.3 onwards to copy stuff into S3. There is no 5GB file limitation with S3a either.

The distcp tool is available as standard, you can use that to copy multiple files over from HDFS if you wish. The doc does not refer to S3A but you should be able to use s3a in place of s3n with no other change

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_admin_distcp_data_c...

Regards,
Gautam Gopalakrishnan
Highlighted
Explorer
Posts: 10
Registered: ‎06-25-2015

Re: Cloudera Express- HDFS - Using CLI to Backup HDFS to S3 Bucket in AWS

Do you think it would be better / safer to copy HBASE info to S3 or should i export it to another no-sql database?

 

Thanks! 


@GautamG wrote:
S3A is now the preferred protocol from CDH 5.3 onwards to copy stuff into S3. There is no 5GB file limitation with S3a either.

The distcp tool is available as standard, you can use that to copy multiple files over from HDFS if you wish. The doc does not refer to S3A but you should be able to use s3a in place of s3n with no other change

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_admin_distcp_data_c...


 

Announcements