07-10-2017 04:52 PM
I have a Spark job that deals with some sensitive data the output of which should be restricted to a small number of users. The intended output is that the Spark job writes the output to a text file in HDFS in a separate encryption zone. I would like to create an external table on top of this file for downstream consumption through Tableau.
However when I create the external table, the file is owned by group Hive, and thus all admins on the cluster can see and select from the table. As such:
(1) What is the appropriate group membership for this file? There is a specific group that corresponds to authorized viewers of this table.
(2) Is there a way to allow access through a tool like Tableau to view this table in tabular form that can circumvent group Hive from having access to the data?
07-10-2017 09:50 PM
You can refer the below link where i've mentioned the advantage and limitations of each security methods
07-11-2017 12:08 AM
07-11-2017 07:02 AM
Thank you. It looks to me like the HDFS ACL for this file is too expansive. I need to sync with administrators to understand the right ACL, but it's good to know that if I set the ACL right it won't matter that the hive group has access to the file.