Created 06-21-2018 06:34 AM
If the Kafka infrastructure is shared between 2 projects ,how would we authorize and monitor who’s reading what data? What’s the security pattern on a shared Kafka infrastructure? which project will have access to kafka clusters ? will it be shared Ops?
Hi @Geoffrey Shelton Okot , thank you so much. The information is very useful. I know Kafka ACL is for Authorization. How can we find out who has access to Kafka Clusters? how can we monitor who is reading what data? Ranger is also used for Authorization right? the options mentioned by you are present in Apache Kafka or confluent Kafka?
Hi Geoff, thank you so much. The information is very useful. I know Kafka ACL is for Authorization.How can we find out who has access to Kafka Clusters? how can we monitor who is reading what data? Ranger is also used for Authorization right?
Yes Ranger is used for authorization.
To use Ranger policy you MUST first enable kerberos, once that is done then you can restrict users or groups to publish,consume,configure,describe or even create policy to block or accept based on ip-range.
I think both Apache & Confluent Kafka has similar security model
Even if we introduce ranger, how are we going to control authorization of data? No one has really thought through the details… “Devil is in the details”. Ranger provides you topic level authorization only.
Let me give you an example:
CDC extracts information from the authorization table and puts it into a topic “Auths” and has 10 fields (assume field1, field2… field10).
Consumer 1 is only authorised to view fields field1 to 5, and Consumer 2 is authorised to view fields field2 to field10.
How are we going to implement data level policy when the actual data is within the payload. Both consumers are authorized to read from the topic “Auths”. Ranger will enable an all or nothing on the payload on the topic itself. But, here the payload is the entire mainframe record !!!. Plese help
You are right you can't control authorization at data level but ONLY at topic level. You could simplify you task by the CDC extracting 2 distinct topics with fields 1-5 for consumer1 and topic with fields 2-10 for consumer2 and create these 2 policies in Ranger that would be easier to manage.
Thanks again, you mean to have filters in CDC isn't it ? once we have filters and get the data we need to send it to Kafka stream/Spark SQL and then fed in to Kafka filter to fed to 2 different consumers right? correctt my understanding and also having filters in CDC would affect the performance of Source systems right?