Support Questions

Find answers, ask questions, and share your expertise

Apache Ranger or Apache Sentry?

avatar
New Contributor

How is data quality managed?Which tool should be used for Hadoop security - Apache Ranger or Apache Sentry?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Mnju 

Ranger and Sentry don't offer data quality but a centralized security framework to manage fine-grained access control and policies across the cluster. Security administrators use it to easily manage policies for access to files, folders, databases, tables, or columns. These policies can be set for individual users or groups and then enforced consistently across the Cluster.

 

The latest version of Ranger that ships with CDP now available for AWS and later this year for Azure manage access and authorization to the below resources using ranger plugins.

HDFSHiveOzoneAtlas
Nifi-RegistryStormHBaseKnox
KafkaYARNNiFiSolr

 

Sentry is a granular, role-based authorization module for Hadoop and provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications on a Hadoop cluster. It works out of the box with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala, and HDFS (limited to Hive table data) and allows you to define authorization rules to validate a user or application’s access requests for Hadoop resources

Both are security tools built for Hadoop that are usually combined with Kerberos, KMS, and TLS to provide a robust security framework.

Data quality is a broad subject of discussion but to my knowledge, no tool manages data quality unless you are talking of catalog tools like Alation, waterline. Atlas is a tool that provides metadata management, data lineage and governance capabilities to build a catalog of data assets, classification, and governance across these assets.


HTH

 

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@Mnju 

Ranger and Sentry don't offer data quality but a centralized security framework to manage fine-grained access control and policies across the cluster. Security administrators use it to easily manage policies for access to files, folders, databases, tables, or columns. These policies can be set for individual users or groups and then enforced consistently across the Cluster.

 

The latest version of Ranger that ships with CDP now available for AWS and later this year for Azure manage access and authorization to the below resources using ranger plugins.

HDFSHiveOzoneAtlas
Nifi-RegistryStormHBaseKnox
KafkaYARNNiFiSolr

 

Sentry is a granular, role-based authorization module for Hadoop and provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications on a Hadoop cluster. It works out of the box with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala, and HDFS (limited to Hive table data) and allows you to define authorization rules to validate a user or application’s access requests for Hadoop resources

Both are security tools built for Hadoop that are usually combined with Kerberos, KMS, and TLS to provide a robust security framework.

Data quality is a broad subject of discussion but to my knowledge, no tool manages data quality unless you are talking of catalog tools like Alation, waterline. Atlas is a tool that provides metadata management, data lineage and governance capabilities to build a catalog of data assets, classification, and governance across these assets.


HTH

 

avatar
Cloudera Employee

Comparison between Apache Sentry and Apache Ranger based on features offered by them:

 

FeatureApache SentryApache Ranger
Role-Based Access Control [RBAC]YesYes
Deny SupportNoYes
Admin Web User InterfaceNoYes
REST API SupportNoYes
CLI SupportYesNo
Audits SupportNoYes
Plugins SupportedImpala, Hive, HDFS, Solr, KafkaImpala, Hive, HDFS, Solr, Kafka, HBase, Knox, Yarn, Storm, etc
Tag-based policyNoYes
Row Level FilteringNoYes
Column MaskingNoYes
HDFS ACL SyncYesNo [Will be supported in upcoming CDP releases]

 

As we can see Apache Ranger supports more features like tag-based policy, row-level filtering, column masking, audits, admin web interface, more services, or plugins in CDP stack, and that's why its the default choice for the authorization service in CDP.

 

For more detailed comparison see this article by @EricL 

https://www.ericlin.me/2020/01/introduction-to-apache-ranger-part-i/