We have a data retention requirement that is pushing storage capacity on my cluster. It is not in the budget to increase storage or add more data nodes. I do have some older servers that I am thinking of using to archive older data that is not accessed very often (if at all). Trying to find some guidance as to best practices for doing this. One though is to dump HDFS partition directories into a very similar file structure on the archive server. Seems there would be a huge IO/network hit with this option and I would have to make room to pull data back into the cluster if I ever needed to access it.
Anyone have experience with archiving data from hadoop?