Support Questions
Find answers, ask questions, and share your expertise

HDFS Snapshots and Encryption Zones



Looking for best practices on where to allow snapshots in HDFS directory tree,if the tree contains encryption zones as child level directories.

For example, assume we have a directory in HDFS as /data-dir

Now, we want to leverage HDFS Snapshots for backup and DR purposes. Assume we have child directories under /data-dir i.e. /data-dir/encr-zone1, /data-dir/encr-zone2 and /data-dir/non-ez. Here encr-zone1 and encr-zone2 are setup as HDFS encryption zones while non-ez is not. We have two choices to create snapshots.

  • a. To enable /data-dir as snapshottable - Once /data-dir is made snapshottable, it creates a .snapshot directory under /data-dir where all the snapshots information is maintained.
  • b. To make /data-dir/child-dir1 and /data-dir/child-dir2 individually as snapshottable - Going this route prevents us from making /data-dir or any grand-child directories under the child directories as snapshottable in the future. Snippet from HDFS Snapshot documentation – Nested snapshottable directories are currently not allowed. In other words, a directory cannot be set to snapshottable if one of its ancestors/descendants is a snapshottable directory.”

So, as a recommendation, which is the best option to take? Creating encryption zones at child-dir level (example, one encryption zone for each hive Database) and allow snapshots at the top-most parent level?

Are there are any issues / security concerns with this approach? If so, what is the other alternative?




Re: HDFS Snapshots and Encryption Zones

@Vijaya Narayana Reddy Bhoomi Reddy

I believe making individual directories as snapshottable may be more secure. Snapshots taken would be contained within the respective TDE zone. I don't know for sure if the snapshots captured at the higher level directory carry the same encryption from the child directories.