Thanks @Shelton for your response. I have come across this link but it is about EMR's integration. I am specifically seeking clarifications on the ranger APIs to handle the authorization of the data lying in S3. Not able to find a clear picture around this in order to begin integration with Ranger.
... View more
We are actually looking forward to integrating Ranger with a SQL Query Engine to gain control of the authorization before the actual SQL Query is sent for execution. We want to ensure that the user is authorized to access certain tables or columns before we actually execute the query provided. ASSUMPTION: data is residing in S3 storage and user submits a SQL query involving certain table/columns to the query engine for its processing. Below are the things I could not find firmly. 1. Can Ranger be used to authorize the data located on cloud storages such as S3, GCS ? If yes, can the user role policies be configured to control the table / column level data on such cloud storages ? <please provide supporting reference on how it needs to be integrated with cloud storages> 2. Assuming that the user role policies are existing on Ranger for table/columns data, what are the Rest APIs that we need to invoke in order to find whether the user is authorized to access the data or not. 3. Are there any Rest APIs which accepts User ID and list of Tables/columns (in context of SQL Query) and provides us whether or not the user is authorized to access those ? p.s : Referred: https://ranger.apache.org/apidocs/index.html but could not find the APIs that can accept userID or list of tables/columns involved in a query and tell us whether he is authorized or not 😞 Would really appreciate any inputs on my above clarifications.
... View more