How is data quality managed?Which tool should be used for Hadoop security - Apache Ranger or Apache Sentry?
Created 10-28-2019 11:55 AM
Ranger and Sentry don't offer data quality but a centralized security framework to manage fine-grained access control and policies across the cluster. Security administrators use it to easily manage policies for access to files, folders, databases, tables, or columns. These policies can be set for individual users or groups and then enforced consistently across the Cluster.
The latest version of Ranger that ships with CDP now available for AWS and later this year for Azure manage access and authorization to the below resources using ranger plugins.
HDFS | Hive | Ozone | Atlas |
Nifi-Registry | Storm | HBase | Knox |
Kafka | YARN | NiFi | Solr |
Sentry is a granular, role-based authorization module for Hadoop and provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications on a Hadoop cluster. It works out of the box with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala, and HDFS (limited to Hive table data) and allows you to define authorization rules to validate a user or application’s access requests for Hadoop resources
Both are security tools built for Hadoop that are usually combined with Kerberos, KMS, and TLS to provide a robust security framework.
Data quality is a broad subject of discussion but to my knowledge, no tool manages data quality unless you are talking of catalog tools like Alation, waterline. Atlas is a tool that provides metadata management, data lineage and governance capabilities to build a catalog of data assets, classification, and governance across these assets.
HTH
Created 10-28-2019 11:55 AM
Ranger and Sentry don't offer data quality but a centralized security framework to manage fine-grained access control and policies across the cluster. Security administrators use it to easily manage policies for access to files, folders, databases, tables, or columns. These policies can be set for individual users or groups and then enforced consistently across the Cluster.
The latest version of Ranger that ships with CDP now available for AWS and later this year for Azure manage access and authorization to the below resources using ranger plugins.
HDFS | Hive | Ozone | Atlas |
Nifi-Registry | Storm | HBase | Knox |
Kafka | YARN | NiFi | Solr |
Sentry is a granular, role-based authorization module for Hadoop and provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications on a Hadoop cluster. It works out of the box with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala, and HDFS (limited to Hive table data) and allows you to define authorization rules to validate a user or application’s access requests for Hadoop resources
Both are security tools built for Hadoop that are usually combined with Kerberos, KMS, and TLS to provide a robust security framework.
Data quality is a broad subject of discussion but to my knowledge, no tool manages data quality unless you are talking of catalog tools like Alation, waterline. Atlas is a tool that provides metadata management, data lineage and governance capabilities to build a catalog of data assets, classification, and governance across these assets.
HTH
Created 07-22-2020 01:25 AM
Comparison between Apache Sentry and Apache Ranger based on features offered by them:
Feature | Apache Sentry | Apache Ranger |
Role-Based Access Control [RBAC] | Yes | Yes |
Deny Support | No | Yes |
Admin Web User Interface | No | Yes |
REST API Support | No | Yes |
CLI Support | Yes | No |
Audits Support | No | Yes |
Plugins Supported | Impala, Hive, HDFS, Solr, Kafka | Impala, Hive, HDFS, Solr, Kafka, HBase, Knox, Yarn, Storm, etc |
Tag-based policy | No | Yes |
Row Level Filtering | No | Yes |
Column Masking | No | Yes |
HDFS ACL Sync | Yes | No [Will be supported in upcoming CDP releases] |
As we can see Apache Ranger supports more features like tag-based policy, row-level filtering, column masking, audits, admin web interface, more services, or plugins in CDP stack, and that's why its the default choice for the authorization service in CDP.
For more detailed comparison see this article by @EricL
https://www.ericlin.me/2020/01/introduction-to-apache-ranger-part-i/