Support Questions
Find answers, ask questions, and share your expertise

Hadoop Security for beginners

Hadoop Security for beginners


We can secure the data in Hadoop using different methods. Each method has its own advantages. We can also combine more than one method for better result


Security Method
Unique Advantage & Limitation
1KerberosKerberos is a network authentication protocol

Advantage: Authenticate users at the entry level.

Limitation: Kerberos prevents unauthorized user access to the environment. But after login, it will not provide detailed level authentications like table, column, folder, file level, etc

2Aapache Sentry

Apache Sentry is a system for enforcing fine grained role based authorization

to data and metadata stored on a Hadoop cluster

Advantage: Application level authentications like Hive, Impala, Solr, etc. It can control access on DB, table, column level for a particular user/group.

Limitation 1: It cannot control the HDFS folders which are underlined behind applications like Hive, Impala, etc.

Ex: Hive table prod.table1 stored in /user/hive/warehouse/prod.db/table1. The sentry role setup in Hue can control only table/column access in Hue but It is possible that user can manage to access folders directly in HDFS

Limitation 2: HDFS folders which are not related to Hive, Impala, etc will not be controlled

3Access Control List (ACL)

An access control list (ACL) is a list of access control entries (ACE).

Each ACE in an ACL identifies a trustee and specifies the access rights allowed, denied, or audited for that trustee

Advantage: Folder level access is possible by users using $hadoop fs -setfacl


4HDFS Data At Rest Encryption (EDEK)

HDFS Encryption implements transparent, end-to-end encryption of data read from and written to HDFS


Advantage: Encrypt the data will provide additional level security. In General, Data encryption is required by a number of different government, financial, and regulatory entities

Ex: Unauthorized data access will return result in encrypted format

hadoop fs -cat /data/File.txt





Re: Hadoop Security for beginners

Rising Star

This is a well written blog. I had a few points to add to it:

  • HDFS Data At Rest Encryption (EDEK) 
    • One is also able to protect the data from someone who walks away with the disk.
  • HDFS wire Encryption
    • Setting up wire encryption will prevent network traffic snoopers from accessing the data when it is in motion. 
    • Wire encryption along with Encryption at rest will cause end to end protection of data.

Re: Hadoop Security for beginners


@surajacharya : Thanks much to add additional points!!


Re: Hadoop Security for beginners


Kumar, pretty good informative points.


One question regarding ACL: if sentry is enabled do we need to disable ACL, in other words if sentry is enabled on the hive then ACL is required or not required. I did read some info on cloudera knowledge base under enabling sentry inforation: cloudera recommending not to enable ACL when sentry is enabled.





Re: Hadoop Security for beginners




Any security mechanism for fine graned access for Spark SQL queries? How I can restrict the users to access only certain columns? I know there is a RecordService in Beta. Any other solutions that folks have used? 



Nagaraj C


Re: Hadoop Security for beginners




Apache Sentry will help you to restrict the user access on db/table/column for hive/impala/solr/etc.

You can set this acces for a group/user using role.

So access to those db/table/column via spark code will also be authorized by sentry


Re: Hadoop Security for beginners




After access granted on a particular db/table via sentry for a user/group, I have login as a different user in HDFS and tried to acces the restricted db/table, but the different users couldn't access the restricted db/table. So my personal opinion is, it is not required to apply ACL on top of already restricted db/table. so we can go with cloudera recommendation.

But consider the use case that you have an important file/folder in HDFS (not a table) that you want to restrict from other users. So you can use ACL in this use case.