06-25-2015 07:52 AM
I am trying to setup HDFS backup using S3 Buckets on Clouera. I have searched the internet high and low for the syntax to do this via CLI and have had no luck!
After many hours I have figured out how to get this to work
If you have this defined in the properties file use:
sudo -u hdfs hdfs dfs -cp hdfs://nameservice/* s3n://@BUCKET-NAME/
If you do not have it defined:
sudo -u hdfs hdfs dfs -cp hdfs://nameservice/* s3n://SECRET-KEY:PRIVATE-KEY@s3://s3-us-BUCKET-NAME/
I hope that saves many users hours of guessing and configuration
Ok so my issue is
What is the syntax for s3 vs s3n? s3n works flawlessly however it limits files to less than 5gb in size!
Since I am backing up HDFS which is >5gb in size I will need to use the s3://
What am I missing? Changing the above commands from s3n:// to s3:// does gives me the following error:
cp: `s3://BUCKET-NAME/': No such file or directory
06-25-2015 09:02 AM
Also, the properties files where you save your secret / private key are
Cloudera Manager > HDFS Configuration > Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml
Then set the following properties:
<!-- Amazon S3 -->
08-04-2015 12:42 PM
Never got a response on where to go to automate the backups of the Cloudera - Express version.
Any ideas on how to do this with AWS? Any help or guidance would be greatly appreciated!!
08-04-2015 06:23 PM
08-05-2015 08:53 AM
Do you think it would be better / safer to copy HBASE info to S3 or should i export it to another no-sql database?
S3A is now the preferred protocol from CDH 5.3 onwards to copy stuff into S3. There is no 5GB file limitation with S3a either.
The distcp tool is available as standard, you can use that to copy multiple files over from HDFS if you wish. The doc does not refer to S3A but you should be able to use s3a in place of s3n with no other change