Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

If the Kafka infrastructure is shared ,how would we authorise and monitor who’s reading what data?What’s the security pattern on a shared kafka infrastructure?

If the Kafka infrastructure is shared ,how would we authorise and monitor who’s reading what data?What’s the security pattern on a shared kafka infrastructure?

New Contributor

If the Kafka infrastructure is shared between 2 projects ,how would we authorize and monitor who’s reading what data? What’s the security pattern on a shared Kafka infrastructure? which project will have access to kafka clusters ? will it be shared Ops?

7 REPLIES 7

Re: If the Kafka infrastructure is shared ,how would we authorise and monitor who’s reading what data?What’s the security pattern on a shared kafka infrastructure?

Mentor

@harsha vardhan bandaru

If have options like Kafka ACL's or Ranger policy see also this blog and you could compliment it with Kerberos

HTH

Re: If the Kafka infrastructure is shared ,how would we authorise and monitor who’s reading what data?What’s the security pattern on a shared kafka infrastructure?

New Contributor

Hi @Geoffrey Shelton Okot , thank you so much. The information is very useful. I know Kafka ACL is for Authorization. How can we find out who has access to Kafka Clusters? how can we monitor who is reading what data? Ranger is also used for Authorization right? the options mentioned by you are present in Apache Kafka or confluent Kafka?

Highlighted

Re: If the Kafka infrastructure is shared ,how would we authorise and monitor who’s reading what data?What’s the security pattern on a shared kafka infrastructure?

New Contributor

Hi Geoff, thank you so much. The information is very useful. I know Kafka ACL is for Authorization.How can we find out who has access to Kafka Clusters? how can we monitor who is reading what data? Ranger is also used for Authorization right?

Re: If the Kafka infrastructure is shared ,how would we authorise and monitor who’s reading what data?What’s the security pattern on a shared kafka infrastructure?

Mentor

@harsha vardhan bandaru

Yes Ranger is used for authorization.

To use Ranger policy you MUST first enable kerberos, once that is done then you can restrict users or groups to publish,consume,configure,describe or even create policy to block or accept based on ip-range.

I think both Apache & Confluent Kafka has similar security model

HTH

Re: If the Kafka infrastructure is shared ,how would we authorise and monitor who’s reading what data?What’s the security pattern on a shared kafka infrastructure?

New Contributor

Even if we introduce ranger, how are we going to control authorization of data? No one has really thought through the details… “Devil is in the details”. Ranger provides you topic level authorization only.

Let me give you an example:

CDC extracts information from the authorization table and puts it into a topic “Auths” and has 10 fields (assume field1, field2… field10).

Consumer 1 is only authorised to view fields field1 to 5, and Consumer 2 is authorised to view fields field2 to field10.

How are we going to implement data level policy when the actual data is within the payload. Both consumers are authorized to read from the topic “Auths”. Ranger will enable an all or nothing on the payload on the topic itself. But, here the payload is the entire mainframe record !!!. Plese help

Re: If the Kafka infrastructure is shared ,how would we authorise and monitor who’s reading what data?What’s the security pattern on a shared kafka infrastructure?

Mentor

@harsha vardhan bandaru

You are right you can't control authorization at data level but ONLY at topic level. You could simplify you task by the CDC extracting 2 distinct topics with fields 1-5 for consumer1 and topic with fields 2-10 for consumer2 and create these 2 policies in Ranger that would be easier to manage.

HTH

Re: If the Kafka infrastructure is shared ,how would we authorise and monitor who’s reading what data?What’s the security pattern on a shared kafka infrastructure?

New Contributor

Thanks again, you mean to have filters in CDC isn't it ? once we have filters and get the data we need to send it to Kafka stream/Spark SQL and then fed in to Kafka filter to fed to 2 different consumers right? correctt my understanding and also having filters in CDC would affect the performance of Source systems right?

Don't have an account?
Coming from Hortonworks? Activate your account here