Created on 03-30-202011:15 PM - edited on 03-30-202011:19 PM by VidyaSargur
Summary
Below is an entire summary of different verticals to be considered while securing an Enterprise HDP cluster:
FreeIPA / Active Directory for Authentication
LDAP - User/Group level Authentication for WebUI & JDBC connections
Kerberos - Service level Authentication for Services & JDBC
Note:FreeIPA has been recommended here as it is a single solution for LDAP Directory Server, MIT Kerberos, NTP, DNS, Dogtag certificate system, SSSD, and others. Alternatively, Microsoft’s AD (Active Directory) can be used as well
Atlas for Data Lineage & Data Governance ( Data Classification, Data Masking - PII, & Data Security)
Ranger for Authorization & Auditing
Resource-Based security policies on HDP Services
TAG Based security policies for data tagged using ATLAS
KNOX - SSO(Single Sign-On) and Perimeter Security
Encryption (performance impact needs to be considered)
Ranger KMS - for Data on Rest Encryption
TLS/SSL - for Wire Data Encryption
Data Steward Studio - Cross Cluster Data Governance (multiple clusters)
Architecture
The below diagrams represent the setup architecture of the Cluster and in detail communication between components.
The user connects to the cluster directly:Without Knox
User connects via Knox:With Knox
HDP Components for Security
As part of securing the cluster, the cluster is to be Kerberized and other key components used are Apache Ranger and Apache Knox. Strongly authenticating and establishing a user’s identity is the basis for secure access in Hadoop.
Why Kerberos?
Establishes identity for clients, hosts, and services
Prevents impersonation, passwords are never sent over the wire
Integrates with enterprise identity management tools such as LDAP and Active Directory
More granular auditing of data access/job execution
The Apache Knox Gateway (“Knox”) is a system to extend the reach of Apache™ Hadoop® services to users outside of a Hadoop cluster without reducing Hadoop Security. Knox also simplifies Hadoop security for users who access the cluster data and execute jobs. The Knox Gateway is designed as a reverse proxy.
Apache Ranger™ is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. The vision with Ranger is to provide comprehensive security across the Apache Hadoop ecosystem. With the advent of Apache YARN, the Hadoop platform can now support a true data lake architecture. Enterprises can potentially run multiple workloads, in a multi-tenant environment. Data security within Hadoop needs to evolve to support multiple use cases for data access, while also providing a framework for central administration of security policies and monitoring of user access.