Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

archiving data from hadoop

archiving data from hadoop


We have a data retention requirement that is pushing storage capacity on my cluster. It is not in the budget to increase storage or add more data nodes. I do have some older servers that I am thinking of using to archive older data that is not accessed very often (if at all). Trying to find some guidance as to best practices for doing this. One though is to dump HDFS partition directories into a very similar file structure on the archive server. Seems there would be a huge IO/network hit with this option and I would have to make room to pull data back into the cluster if I ever needed to access it.

Anyone have experience with archiving data from hadoop?

Don't have an account?
Coming from Hortonworks? Activate your account here