Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Expert Contributor

Authentification

1- Kerberos : Kerberos is mandatory for prod environments, you can either use your AD embeded kerberos or install a new dedicated KDC - Kerberos must be in HA

Risk not doing the above : User impersonation for the services accounts ( jobs can be exported to run as super user permission )

2 - Use a firewall to block all inbound traffic to the cluster- all sources / all ports except from the edge node ( Gateway )

Risk not doing the above : Passwords in the wrong hands will systematically give access to the cluster

3- Check the permissions of the keytabs, detailed this article here : Script to fix permissions and ownership of hadoop keytabs

Risk not doing the above: utilisation by other cluster users

4- Use Knox for all API calls to the cluster.

Benefits : inbound from "trusted" known machines, that requires authentification against an existing LDAP.

Network

1 - The cluster must be in an isolated subnet - no interference with other networks, for security and thouroughput

Risk not doing the above: Data interception by/from other machines in the data center.

2- Cluster machines can be linked internally on "non-routed" mode, and the config of the hosts via /etc/hosts in all machines.

3 - Flat Network is not recommended.

Risk not doing the above : File inclusion attacks from other machines in the data center.

4- Possibilty of having two DNS resolutions ( internal and external ) is acceptable if the DNS server is HA

Although you can combine /etc/hosts with DNS config.

5- IPtables must be disabled within the cluster

This is pre-requisite for the installation

6 - /etc/hosts must be configured with the FQDN. Ambari server needs the resolution of all nodes in the cluster in its /etc/hosts

This is pre-requisite for the installation

Authorizations

1-Give systematically 000 permissions to HDFS files and folders of the data lake ( /data ) , only Ranger controls the access via policies

Risk not doing the above: Users can access through ACLs and ignore Ranger policies

2 - You can use Umask : fs.permissions.umask-mode = 0022

Risk not doing the above: Wrong permissions, may lead to ranger policies being ignored.

Other Best practices :

Do not share the password of super users ( hdfs, hive, spark ... etc ) with all teams, only root should own it.

You can disable connection ssh for some super users ( Knox, Spark ... etc )

Please feel free to comment for enhancements ..

5,555 Views