every Hive table has to be owned by “hive” user and if anyone creates an external table, that won’t be owned by Hive user. Are we supposed to restrict usage of external tables?
I have the same problem, as users of Hive (configured with doAs=false and security in Ranger) create lots of external tables to map their data. But hive is unable to access this data by default, we have to give explicit permissions for hive user to read the hdfs data of external table. That is very cumbersome.
I don't see any best practice regarding external tables in the document you referenced. Do you guys have any advice how to handle external tables in such case?
Can't we just overload the HDFS Policies? So for example at a client we are using doAs false so we can use column security via Hive, but then for the 'application' that loads the data also has an HDFS Policy so it can directly run MR jobs and the like to get the data loaded for end Hive users.