One of our customers is looking for ways to separate their data sets physically with in a cluster to meet their regulatory standards. In a multi tenant environment, this customer would like to store the data related to Group A on a set of datanodes physically separated from the data related to Group B and so on ..
I know MapR is marketing this feature as a key differentiator but do you know how to address this concern/need from our customers on Hortonworks platform? I heard NameNode federation is the way to go but I was not able to find any info related to this and also are there any other ways to achieve physical data separation with in a single HDP cluster?
Hi @rbalam, can you provide which regulatory standard you need for compliance? It isn't uncommon for shared data storage (think SAN storage) requires PII or HIPAA compliance on all data if any of the data is considered PII or HIPAA but usually physical separation is unnecessary. Separating the data in a data lake could be achieved via a number of solutions, e.g. encryptions zones, Hive masking, node labels, etc.
The question for us to answer here is, whether we can help the customer separate the data-sets physically if they need to using the current HDP tech stack or not?