How is data quality managed?Which tool should be used for Hadoop security - Apache Ranger or Apache Sentry?
Ranger and Sentry don't offer data quality but a centralized security framework to manage fine-grained access control and policies across the cluster. Security administrators use it to easily manage policies for access to files, folders, databases, tables, or columns. These policies can be set for individual users or groups and then enforced consistently across the Cluster.
The latest version of Ranger that ships with CDP now available for AWS and later this year for Azure manage access and authorization to the below resources using ranger plugins.
Sentry is a granular, role-based authorization module for Hadoop and provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications on a Hadoop cluster. It works out of the box with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala, and HDFS (limited to Hive table data) and allows you to define authorization rules to validate a user or application’s access requests for Hadoop resources
Both are security tools built for Hadoop that are usually combined with Kerberos, KMS, and TLS to provide a robust security framework.
Data quality is a broad subject of discussion but to my knowledge, no tool manages data quality unless you are talking of catalog tools like Alation, waterline. Atlas is a tool that provides metadata management, data lineage and governance capabilities to build a catalog of data assets, classification, and governance across these assets.