Looking for best practices on where to allow snapshots in HDFS directory tree,if the tree contains encryption zones as child level directories.
For example, assume
we have a directory in HDFS as /data-dir
we want to leverage HDFS Snapshots for backup and DR purposes. Assume we have
child directories under /data-dir i.e. /data-dir/encr-zone1, /data-dir/encr-zone2 and /data-dir/non-ez. Here encr-zone1 and encr-zone2 are setup as HDFS encryption zones while non-ez is not.We have two choices to create snapshots.
a. To enable /data-dir as
snapshottable - Once /data-dir is made snapshottable, it creates a
.snapshot directory under /data-dir where all the snapshots information is maintained.
b. To make /data-dir/child-dir1 and
/data-dir/child-dir2 individually as snapshottable - Going this route prevents us from making /data-dir or
any grand-child directories under the child directories as snapshottable in the future.
Snippet from HDFS Snapshot documentation – “Nested
snapshottable directories are currently not allowed. In other words, a
directory cannot be set to snapshottable if one of its ancestors/descendants is
a snapshottable directory.”
So, as a recommendation, which is the best option to take? Creating encryption zones at child-dir level (example, one encryption zone for each hive Database) and allow snapshots at the top-most parent level?
Are there are any issues / security concerns with this approach? If so, what is the other alternative?
I believe making individual directories as snapshottable may be more secure. Snapshots taken would be contained within the respective TDE zone. I don't know for sure if the snapshots captured at the higher level directory carry the same encryption from the child directories.