I'm currently designing some user/project spaces in a multi-tenant cluster. Sentry is used, also the HDFS permissions sync enabled for specific (Hive related) paths. I defined a few scenarios taking into consideration multiple factors:
- user knowledge level
- user needs in terms of spaces/zones/services
- service interaction (Hive, Impala, HDFS)
- possible pipeline options
- folder structure
- naming conventions and more..
Now, in one scenario, it would make sense for me to enable Sentry-HDFS permissions sync for the whole space ( sentry.hdfs.integration.path.prefixes = "/" ) . My "spider sense" is telling me that I shouldn't do it, since it might mess with somethingin now or the future. I know that Sentry is not managing the permissions for the HDFS paths without valid Hive objects, but still, it sounds like a bad idea without being able to point to a specific issue.
Did anyone tried this in the past? Do you foresee any disaster or issue if this is being used?
To motivate the case, imagine that you have something like "/team1/db" and "/team2/db" , both already deployed and configured in HDFS to sync the permissions with Sentry. When you onboard a new member, let's call them "team3", you'll have to add the "/team3/db" path to the config and restart a few services. I'd like to avoid restarting services on user/team onboarding.