Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
New Contributor

Summary

Below is an entire summary of different verticals to be considered while securing an Enterprise HDP cluster:

  1. FreeIPA / Active Directory for Authentication
    • LDAP - User/Group level Authentication for WebUI & JDBC connections
    • Kerberos - Service level Authentication for Services & JDBC
      Note: FreeIPA has been recommended here as it is a single solution for LDAP Directory Server, MIT Kerberos, NTP, DNS, Dogtag certificate system, SSSD, and others.
      Alternatively, Microsoft’s AD (Active Directory) can be used as well
  2. Atlas for Data Lineage & Data Governance ( Data Classification, Data Masking - PII, & Data Security)
  3. Ranger for Authorization & Auditing
    • Resource-Based security policies on HDP Services
    • TAG Based security policies for data tagged using ATLAS
  4. KNOX - SSO(Single Sign-On) and Perimeter Security
  5. Encryption (performance impact needs to be considered)
    • Ranger KMS - for Data on Rest Encryption
    • TLS/SSL - for Wire Data Encryption
  6. Data Steward Studio - Cross Cluster Data Governance (multiple clusters)

 Architecture

The below diagrams represent the setup architecture of the Cluster and in detail communication between components.

  1. The user connects to the cluster directly:Without KnoxWithout Knox
  2. User connects via Knox:With KnoxWith Knox

HDP Components for Security

As part of securing the cluster, the cluster is to be Kerberized and other key components used are Apache Ranger and Apache Knox. Strongly authenticating and establishing a user’s identity is the basis for secure access in Hadoop. 

Why Kerberos?

  • Establishes identity for clients, hosts, and services
  • Prevents impersonation, passwords are never sent over the wire
  • Integrates with enterprise identity management tools such as LDAP and Active Directory
  • More granular auditing of data access/job execution

The Apache Knox Gateway (“Knox”) is a system to extend the reach of Apache™ Hadoop® services to users outside of a Hadoop cluster without reducing Hadoop Security. Knox also simplifies Hadoop security for users who access the cluster data and execute jobs. The Knox Gateway is designed as a reverse proxy.

Apache Ranger™ is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. The vision with Ranger is to provide comprehensive security across the Apache Hadoop ecosystem. With the advent of Apache YARN, the Hadoop platform can now support a true data lake architecture. Enterprises can potentially run multiple workloads, in a multi-tenant environment. Data security within Hadoop needs to evolve to support multiple use cases for data access, while also providing a framework for central administration of security policies and monitoring of user access. 

854 Views
0 Kudos