Reply
Posts: 352
Topics: 11
Kudos: 54
Solutions: 30
Registered: ‎09-02-2016

Hadoop Security for beginners

We can secure the data in Hadoop using different methods. Each method has its own advantages. We can also combine more than one method for better result

 

S.No
Security Method
Description
Unique Advantage & Limitation
1KerberosKerberos is a network authentication protocol

Advantage: Authenticate users at the entry level.

Limitation: Kerberos prevents unauthorized user access to the environment. But after login, it will not provide detailed level authentications like table, column, folder, file level, etc

2Aapache Sentry

Apache Sentry is a system for enforcing fine grained role based authorization

to data and metadata stored on a Hadoop cluster

Advantage: Application level authentications like Hive, Impala, Solr, etc. It can control access on DB, table, column level for a particular user/group.

Limitation 1: It cannot control the HDFS folders which are underlined behind applications like Hive, Impala, etc.

Ex: Hive table prod.table1 stored in /user/hive/warehouse/prod.db/table1. The sentry role setup in Hue can control only table/column access in Hue but It is possible that user can manage to access folders directly in HDFS

Limitation 2: HDFS folders which are not related to Hive, Impala, etc will not be controlled

3Access Control List (ACL)

An access control list (ACL) is a list of access control entries (ACE).

Each ACE in an ACL identifies a trustee and specifies the access rights allowed, denied, or audited for that trustee

Advantage: Folder level access is possible by users using $hadoop fs -setfacl

 

4HDFS Data At Rest Encryption (EDEK)

HDFS Encryption implements transparent, end-to-end encryption of data read from and written to HDFS

 

Advantage: Encrypt the data will provide additional level security. In General, Data encryption is required by a number of different government, financial, and regulatory entities

Ex: Unauthorized data access will return result in encrypted format

hadoop fs -cat /data/File.txt
1▒▒Q▒"▒▒▒

 

Thanks

Kumar

Cloudera Employee
Posts: 41
Registered: ‎08-16-2016

Re: Hadoop Security for beginners

This is a well written blog. I had a few points to add to it:

  • HDFS Data At Rest Encryption (EDEK) 
    • One is also able to protect the data from someone who walks away with the disk.
  • HDFS wire Encryption
    • Setting up wire encryption will prevent network traffic snoopers from accessing the data when it is in motion. 
    • Wire encryption along with Encryption at rest will cause end to end protection of data.
Posts: 352
Topics: 11
Kudos: 54
Solutions: 30
Registered: ‎09-02-2016

Re: Hadoop Security for beginners

@surajacharya : Thanks much to add additional points!!

Explorer
Posts: 23
Registered: ‎04-26-2016

Re: Hadoop Security for beginners

[ Edited ]

Kumar, pretty good informative points.

 

One question regarding ACL: if sentry is enabled do we need to disable ACL, in other words if sentry is enabled on the hive then ACL is required or not required. I did read some info on cloudera knowledge base under enabling sentry inforation: cloudera recommending not to enable ACL when sentry is enabled.

Thanks.

 

 

Explorer
Posts: 17
Registered: ‎11-03-2016

Re: Hadoop Security for beginners

All, 

 

Any security mechanism for fine graned access for Spark SQL queries? How I can restrict the users to access only certain columns? I know there is a RecordService in Beta. Any other solutions that folks have used? 

 

Cheers

Nagaraj C

Posts: 352
Topics: 11
Kudos: 54
Solutions: 30
Registered: ‎09-02-2016

Re: Hadoop Security for beginners

@cplusplus1

 

After access granted on a particular db/table via sentry for a user/group, I have login as a different user in HDFS and tried to acces the restricted db/table, but the different users couldn't access the restricted db/table. So my personal opinion is, it is not required to apply ACL on top of already restricted db/table. so we can go with cloudera recommendation.

But consider the use case that you have an important file/folder in HDFS (not a table) that you want to restrict from other users. So you can use ACL in this use case.

Posts: 352
Topics: 11
Kudos: 54
Solutions: 30
Registered: ‎09-02-2016

Re: Hadoop Security for beginners

@chinumari

 

Apache Sentry will help you to restrict the user access on db/table/column for hive/impala/solr/etc.

You can set this acces for a group/user using role.

So access to those db/table/column via spark code will also be authorized by sentry

Announcements