Member since
07-30-2019
181
Posts
205
Kudos Received
51
Solutions
10-14-2016
08:19 PM
12 Kudos
NiFi Identity Conversion
In a secure NiFi environment, the identity of a user can be determined in a number of ways depending on the authentication configuration. Machines also have an identity that needs to be determined upon authentication. Determining the identity of an entity is important to ensure proper authorization and access to resources. Machine Identity
The identity of the node in a NiFi cluster is determined by the SSL certificate that is used for secure communication with other nodes in the cluster. This certificate can be generated by the internal Certificate Authority provided with HDF, or by an external CA. Once SSL is enabled on the cluster using the certificates, they will be stored (by default) in the /etc/nifi/conf/keystore.jks keystore.
To get the node's identity as specified in the certificate, first get the keystore password from the nifi.properties file, then run the keytool command:
cat /etc/nifi/conf/nifi.properties | grep keystorePasswd
nifi.security.keystorePasswd=lF6e7sJsD3KxwNsrVqeXbYhGNu3QqTlhLmC5ztwlX/c
keytool -list -v -keystore /etc/nifi/conf/keystore.jks
This command will print out all of the information about the node's certificate. The Owner field contains the node's identity.
Alias name: nifi-key
Creation date: Oct 7, 2016
Entry type: PrivateKeyEntry
Certificate chain length: 2
Certificate[1]:
Owner: CN=nifi-2.example.com, CN=hosts, CN=accounts, DC=example, DC=com
Issuer: CN=nifi-1.example.com, OU=NIFI
Serial number: 157a059d1cb00000000
Valid from: Fri Oct 07 18:13:43 UTC 2016 until: Mon Oct 07 18:13:43 UTC 2019
Certificate fingerprints:
MD5: C2:BD:6A:CE:86:05:C9:C1:E8:DE:0C:C1:62:B5:27:5B
SHA1: 3A:BA:E4:35:DA:91:D2:DB:E3:A1:BA:C8:7F:19:C4:C2:BD:81:5A:8F
SHA256: 2A:4F:05:51:9E:4F:50:8B:0D:B0:4C:55:AD:21:65:CF:5D:C2:85:8B:BA:0F:CB:5A:95:AC:C4:3D:08:62:13:02
Signature algorithm name: SHA256withRSA
Version: 3
Extensions:
...
In the example above, the identity of the node (Owner of the certificate) is CN=nifi-2.example.com, CN=hosts, CN=accounts, DC=example, DC=com .
If the certificates are managed by the internal CA, the node identity is determined by two parameters in the NiFi configuration that convert the hostname into a distinguished name (DN) format:
The node idenity from the certificate above was generated using the parameters shown in the Ambari NiFi configuration. The NiFi CA uses the CA DN Prefix + hostname + CA DN Suffix to generate the Owner field stored in the certificate. It is important to note the transformation that occurs between the configuration parameteters.
Hostname: nifi-2.example.com
CA DN Prefix: CN=
CA DN Suffix: ,cn=hosts,cn=accounts,dc=example,dc=com
Is translated into a node identity of:
CN=nifi-2.example.com, CN=hosts, CN=accounts, DC=example, DC=com
The lowercase attribute identifiers (cn, dc, etc.) are converted to uppercase (CN, DC, etc.) and a space is added between each component of the distinguishted name. These transformations will become important later when identity conversions are created. User Identity
The user's identity can be determined in multiple ways depending on how security is configured within the cluster:
If certificate based user authentication is used, the user identity is determined from the certificate just as it is for node identity.
If LDAP authentication is used, the user identity is determined by the distinguised name attribute passed back from the LDAP server.
If Kerberos authentication is used, the user idnetity is determined based on the Kerberos principal
Certificate Based User Authentication
The user identity can be determined via SSL certificate in the same way that the node identity is. The same conversion for DN Prefix and DN Suffix occurs when generating user certificates using the SSL Toolkit, and the same methods for pulling the identity out of the certificate can be used.
LDAP Based User Authentication
If LDAP authentication is enabled, the LDAP server will pass back the distinguished name (DN) of the user entry in the directory. This value is used to determine the user identity. It may not be clear from the LDAP server configuration exactly how the DN will be formatted when it is passed back. For pattern matching and idnetity conversion, the case of the field names and spacing of the DN value will be important. To determine the format, a simple ldapsearch can be performed for a known username.
Windows Active Directory:
ldapsearch -W -h adserver.example.com -p 389 -D "cn=hadoopadmin,OU=ServiceUsers,dc=example,dc=com" -b "OU=ServiceUsers,dc=example,dc=com" sAMAccountName=hadoopadmin
OpenLDAP/FreeIPA:
ldapsearch -W -h ldapserver.example.com -p 389 -D "uid=hadoopadmin,cn=users,cn=accounts,dc=example,dc=com" uid=hadoopadmin
In the output, find the dn field for the user:
Windows Active Directory:
dn: CN=hadoopadmin,OU=ServiceUsers,DC=example,DC=com
OpenLDAP/FreeIPA:
dn: uid=hadoopadmin,cn=users,cn=accounts,dc=example,dc=com
Note the case and the spacing of the returned value for later configuration steps.
Kerberos Based User Authentication
When Kerberos authentication is used, the identity of the user is determined from the Kerberos principal. The principal takes a form of username@REALM . For example:
hadoopadmin@EXAMPLE.COM
The realm is (by convention) the domain in uppercase. Identity Conversion
NiFi uses the identity that it determines from the various authentication mechanisms during authorization procedures. In an HDP cluster, authorization is provided by Apache Ranger. Ranger syncs usernames from Active Directory or LDAP, but it does not sync them in the distinguished name format that is returned during authentication against these mechanisms. Likewise, the Kerberos principal format is not typically used in Ranger. As such, the interesting portion of the DN or principal style identity must be parsed out for use with Ranger.
NiFi provides a mechanism for transforming the certificate, LDAP, or Kerberos based identity. This is done via pairings of configuration parameters of the form:
nifi.security.identity.mapping.pattern.<unique>
nifi.security.identity.mapping.value.<unique>
The <unique> portion is replaced with a unique string identifying the purpose of the transformation. There are two pairings created by default ( <unique>=dn, and <unique>=kerb ), but other pairings can be created as needed. For the pattern portion of the pairing, Regular Expression syntax is used to parse the original identity into components. The value portion of the pairing uses these parsed components in variable substition format to build the translated version of the idenity. A few important operators for the translation are:
^ - Denotes the beginning of the value
$ - Denotes the end of the value
() - Assigns matched strings to a variable. Variable names start with 1 and increment for each time used in the Regular Expression
. - Matches any character
* - Matches 0 or more of the preceding character
? - Matches exactly one of any character
Using these operators, it is possible to separate any of the identities discussed so far into their components. Using the dn pairing of configuration parameters, separating the DN returned by LDAP into just the username can be accomplished with the following.
Windows Active Directory:
nifi.security.identity.mapping.pattern.dn = ^CN=(.*?),OU=ServiceUsers.*$
nifi.security.identity.mapping.value.dn = $1
OpenLDAP/FreeIPA:
nifi.security.identity.mapping.pattern.dn = ^uid=(.*?),cn=users.*$
nifi.security.identity.mapping.value.dn = $1
If there is a need to use additional components of the DN for the user identity, the DN can be split into additional variables
nifi.security.identity.mapping.pattern.dn = ^CN=(.*?),OU=(.*?),DC=(.*?),DC=(.*?)$
The full list of variables created by the pattern variable in this example is:
$1 = hadoopadmin
$2 = ServiceUsers
$3 = example
$4 = com
To convert the host identity from SSL certificates (and user identities from internal CA generated user certificates), use an identity mapping pairing such as:
nifi.security.identity.mapping.pattern.host = ^CN=(.*?), CN=hosts.*$
nifi.security.identity.mapping.value.host = $1
In this example, note the space in , CN= and the case of the CN . These are becaue of the conversion that the CA performs when generating the SSL certificate as described aboe.
If Kerberos is enabled on the NiFi cluster, the Kerberos principal can be converted to a username in the following way:
nifi.security.identity.mapping.pattern.kerb = ^(.*?)@(.*?)$
nifi.security.identity.mapping.value.kerb = $1
Conclusion
Identity determination in a NiFi cluster can be a complex topic, but thankfully, NiFi provides a powerful mechanism for parsing identities into a common format understandable by the Ranger authorization mechanisms. Identity mapping pairings should exist for all methods of identity mapping that will be needed in the NiFi cluster. An identity to be mapped should only match a single set of mapping rules to ensure reliable mapping of identities. The default pair of mappings ( dn and kerb ) are defined in the Advanced nifi-properties section of the Ambari NiFi configuration. Additional pairings can be added to the Custom nifi-properties section of the Ambari NiFi configuration.
... View more
Labels:
10-04-2016
01:06 AM
16 Kudos
Ambari 2.4 Kerberos with FreeIPA This tutorial describes how to enable Kerberos using a FreeIPA server for LDAP and KDC functions on HDP 2.5. The following assumptions are made:
An existing HDP 2.5 cluster No existing IPA server There are sufficient resources to create an m3.medium VM to house the FreeIPA server DNS is already taken care of in the environment FreeIPA will run on RHEL/CentOS 7 Step 1: Setup FreeIPA Server Install Entropy Tools
Certain operations like generating encryption keys host entropy for creating random data. A fresh system with no processes running and no real device drivers can have issues generating enough random data for these types of operations. Install the rng-tools package and start rngd to help with this issue: yum -y install rng-tools
systemctl start rngd
systemctl enable rngd Install FreeIPA Server Install NTP and the FreeIPA software and start the NTP service: yum -y install ntp ipa-server ipa-server-dns
systemctl enable ntpd
systemctl start ntpd In order to use FreeIPA for domain resolution within the cluster, there are a few pieces of information that need to be collected:
DNS servers for external lookups. These will be configured as "forwarders" in FreeIPA for handing off DNS resolution for external lookups. Reverse DNS Zone name. This is used for configuring reverse DNS lookups within FreeIPA. The FreeIPA server will calculate this based on the IP address and Netmask of the server if it is unknown. DNS domain to use for the cluster Kerberos realm to use for the cluster (by convention, usually the domain in uppercase) The hostname of the FreeIPA server The IP address to use for the FreeIPA server (if there is more than one on the host). ipa-server-install --domain=example.domain.com \
--realm=EXAMPLE.DOMAIN.COM \
--hostname=ipaserver.example.domain.com \
--ip-address=1.2.3.4
--setup-dns \
--forwarder=8.8.8.8 \
--forwarder=8.8.8.4 \
--reverse-zone=3.2.1.in-addr.arpa.
Enable PTR Record Sync
In order for reverse DNS lookups to work, enable PTR record sync on the FreeIPA server. Get a list of the DNS zones created: ipa dnszone-find --all | grep "Zone name" For each of the DNS zones, enable PTR sync: ipa dnszone-mod $zonename --allow-sync-ptr=true Configure krb5.conf Credential Cache HDP does not support the in-memory keyring storage of the Kerberos credential cache. Edit the /etc/krb5.conf file and change: default_ccache_name = KEYRING:persistent:%{uid} to default_ccache_name = FILE:/tmp/krb5cc_%{uid} Create a hadoopadmin user In order to create users in FreeIPA, an administrative use is required. The default admin@REALM user can be used (password created during IPA server install). Alternatively, create a hadoopadmin user: kinit admin@EXAMPLE.DOMAIN.COM
ipa user-add hadoopadmin --first=Hadoop --last=Admin
ipa group-add-member admins --users=hadoopadmin
ipa passwd hadoopadmin Ambari also requires a group to be created called ambari-managed-principals. This group is not currently created by the Ambari Kerberos wizard. Create the group: ipa group-add ambari-managed-principals Because of the way FreeIPA automatically expires the new password, it is necessary to kinit as hadoopadmin and change the initial password. The password can be set to the same password unless the password policy prohibits password reuse: kinit hadoopadmin@FIELD.HORTONWORKS.COM Step 2: Prepare the HDP Nodes First, disable the chronyd service since it interferes with NTP (which FreeIPA prefers): systemctl stop chronyd
systemctl disable chronyd Configure the HDP nodes to use the FreeIPA server for DNS resolution: echo "nameserver $ipaserver_ip_address" > /etc/resolv.conf All nodes in the HDP cluster must have the ipa-client software installed and be joined to the FreeIPA server: yum -y install ipa-client
ipa-client-install --domain=example.domain.com \
--server=ipaserver.example.domain.com \
--realm=EXAMPLE.DOMAIN.COM \
--principal=hadoopadmin@EXAMPLE.DOMAIN.COM \
--enable-dns-updates On the Amberi server node, install the ipa-admintools package: yum -y install ipa-admintools Step 3: Enable Experimental FreeIPA Support Support for FreeIPA is not enabled by default in Ambari. You must enable the experimental functionality in Ambari before you can select FreeIPA as an option in the Kerberos wizard. In a browser, navigate to: http://ambariserver.example.domain.com:8080/#/experimental Check the box next to enableipa: Step 4: Run the Kerberos Wizard Run the Kerberos wizard from Ambari (Admin -> Kerberos -> Enable Kerberos). Select "Existing IPA" and verify that the prerequisites have been met. Enter the appropriate information into the KDC page: Click through to the Configure Identities page of the wizard. There is a bug in the name of the Spark principal that needs to be corrected. FreeIPA requires principal names to be in lower case, but ambari allows the cluster name to be in mixed case. If the cluster contains capital letters, the creation of the Spark principal will fail. To account for this, the principal names should all contain a reference to the toLower() function in the cluster name variable to ensure that capital letters are corrected before creating the principal. Change the spark.history.kerberos.principal parameter to include the toLower() function: Change from: ${spark-env/spark_user}-${cluster_name}@${realm} To: ${spark-env/spark_user}-${cluster_name|toLower()}@${realm} The rest of the Wizard should complete successfully.
... View more
10-03-2016
10:18 PM
17 Kudos
One Way Trust - MIT KDC to Active Directory Many security environments have strict policies on allowing administrative access to Active Directory. Some performance issues can also require that Hadoop cluster principals for Kerberos are not created directly in AD. To aid in these situations, it may be preferable to use a local MIT KDC in the Hadoop cluster to manage service principals while using a one-way trust to allow AD users to utilze the Hadoop environment. This tutorial describes the steps necessary to create such a trust. The following assumptions are made for this tutorial: An existing HDP cluster Cluster has Kerberos enabled with an MIT KDC The MIT KDC realm name is HDP.HORTONWORKS.COM The MIT KDC server is named kdc-server.hdp.hortonworks.com The AD domain/realm is AD.HORTONWORKS.COM Step 1: Configure the Trust in Active Directory Create a KDC definition in Active Directory
On the AD server, run a command window with Administrator privileges and create a definition for the KDC of the MIT realm: ksetup /addkdc HDP.HORTONWORKS.COM kdc-server.hdp.hortonworks.com
Create the Trust in Active Directory
On the AD server, create an entry for the one-way trust. The password used here will be used later in the MIT KDC configuration of the trust: netdom trust HDP.HORTONWORKS.COM /Domain:AD.HORTONWORKS.COM /add /realm /passwordt:BadPass#1 Step 2: Configure Encryption Types In order for the MIT realm to trust tickets generated by the AD KDC, the encryption types between both KDCs must be compatible. This means that there must be at least one encryption type that is accepted by both the AD server as well as the MIT KDC server. Specify Encryption Types in Active Directory
On the AD server, specify which encryption types are acceptible for communication with the MIT realm. Multiple supported encryption types are specified on the command line separated by spaces: ksetup /SetEncTypeAttr HDP.HORTONWORKS.COM AES256-CTS-HMAC-SHA1-96 AES128-CTS-HMAC-SHA1-96 RC4-HMAC-MD5 DES-CBC-MD5 DES-CBC-CRC
Specify Encryption Types in MIT KDC
By default, all of the encryption types are accepted by the MIT KDC. If security concerns require that the encryption types be limited, this is done in the /etc/krb5.conf file: [libdefaults]
permitted_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96 arcfour-hmac-md5 des-cbc-crc des-cbc-md5 Step 3: Enable Trust in MIT KDC To complete the trust configuration, the trust must be added to the MIT KDC. Add Domain to MIT KDC Configuration
In the /etc/krb5.conf file, add the AD domain to the [realms] section: [realms]
HDP.HORTONWORKS.COM = {
kdc = kdc-server.hortonworks.com
admin_server = kdc-server.hortonworks.com
default_domain = hdp.hortonworks.com
}
AD.HORTONWORKS.COM = {
kdc = ad-server.hortonworks.com
admin_server = ad-server.hortonworks.com
default_domain = ad.hortonworks.com}
Create Trust User
In order for the trust to work, a principal combining the realms in the trust must be created in the MIT KDC. The password for this user must be the same as the password used to create the trust on the AD server: kinit admin/admin@HDP.HORTONWORKS.COM
kadmin -q "addprinc krbtgt/HDP.HORTONWORKS.COM@AD.HORTONWORKS.COM" Step 4: Configure AUTH_TO_LOCAL The Hadoop auth_to_local parameter must be changed to properly convert user principals from the AD domain to usable usernames in the Hadoop cluster. In Ambari, add the following rules to the auth_to_local variable in HDFS -> Configs -> Advanced -> Advanced core-site.xml -> hadoop.security.auth_to_local RULE:[1:$1@$0](^.*@AD\.HORTONWORKS\.COM$)s/^(.*)@AD\.HORTONWORKS\.COM$/$1/g
RULE:[2:$1@$0](^.*@AD\.HORTONWORKS\.COM$)s/^(.*)@AD\.HORTONWORKS\.COM$/$1/g
... View more
08-18-2016
04:51 AM
10 Kudos
Since version 2.6, Apache Hadoop has had the ability to encrypt files that are written to special directories called encryption zones. In order for this at-rest encryption to work, encryption keys need to be managed by a Key Management Service (KMS). Apache Ranger 0.5 provided a scalable, open source KMS to provide key management for the Hadoop ecosystem. These features have made it easier to implement business and mission critical applications on Hadoop where security is a concern. These business/mission critical applications have also brought with them the need for fault tolerance and disaster recovery. Using Apache Falcon, it is easy to configure the copying of data from the Production Hadoop cluster to an off-site Disaster Recover (DR) cluster. But what is the best way to handle the encrypted data? Decrypting/encrypting the data to transfer it can hinder performance, but how do you decrypt data on the DR site without the proper keys from the KMS? In this article, we will investigate 3 different scenarios for managing the encryption keys between two clusters when Ranger KMS is used as the key management infrastructure. Scenario 1 - Completely Separate KMS Instances The first scenario is the case where the Prod cluster has a Ranger KMS instance, and the KR cluster has a Ranger KMS instance. Each is completely separate with no copying of keys. This configuration has some advantage from a security perspective. Since there are two distinct KMS instances, the keys generated for encryption will be different even for the same directory within HDFS. This can provide a certain level of protection should the Production KMS instance be compromised, however, the tradeoff is in the performance of the data copy. To copy the data in this type of environment, use the DistCp command similarly to how you would in a non-encrypted environment. DistCp will take care of the decrypt/encrypt functions automatically: ProdCluster:~$ hadoop distcp -update hdfs://ProdCluster:8020/data/encrypted/file1.txt hdfs://DRCluster:8020/data/encrypted/ Scenario 2 - Two KMS instances, one database In this configuration, the Prod and DR clusters each have a separate KMS Server, but the KMS Servers are both configured to use the same database to store the keys. On the Prod cluster, configure the Ranger KMS per the Hadoop Security Guide. Once the KMS database is set up, copy the database configuration to the DR cluster's Ambari config tab. Make sure to turn off the "Setup Database and Database User" option at the bottom of the config page: Once the KMS instances are both set up and working, creation of the encryption keys in this environment is simpler. Create the encryption key on the Prod cluster using either the Ranger KMS UI (login to Ranger as keyadmin), or via the CLI: ProdCluster:~$ hadoop key create ProdKey1 Specify which key to use to encrypt the data directory. On the Prod cluster: ProdCluster:~$ hdfs crypto -createZone -keyName ProdKey1 -path /data/encrypted On the DR cluster, use the exact same command (even though it is for the DR cluster): DRCluster:~$ hdfs crypto -createZone -keyName ProdKey1 -path /data/encrypted Since both KMS instances use the same keys, the data can be copied using the /.reserved/raw virtual path to avoid decrypting/encrypting the data in transit. Note that it is important to use the -px flag on distcp to ensure that the EDEK (which is saved as an extended attribute) are transferred intact: ProdCluster~$ hadoop distcp -px hdfs://ProdCluster:8020/.reserved/raw/data/encrypted/file1.txt hdfs://DRCluster:8020/.reserved/raw/data/encrypted/
Scenario 3 - Two KMS instances, two databases In this configuration, the Prod and DR clusters each have a separate KMS Server, and each has it's own database store. In this scenario it is necessary to copy the keys from the Prod KMS database to the DR KMS database. The Prod and DR KMS instances are setup separately per the Hadoop Security Guide. The keys for the encryption zones are created on the Prod cluster (the same as Scenario 2): ProdCluster:~$ hadoop key create ProdKey1 Specify which key to use to encrypt the data directory on the Prod cluster: ProdCluster:~$ hdfs crypto -createZone -keyName ProdKey1 -path /data/encrypted
Once the keys are created on the Prod cluster, a script is used to export the keys so they can be copied to the DR cluster. On the node where the KMS Server runs, execute the following: ProdCluster:~# cd /usr/hdp/current/ranger-kms
ProdCluster:~# ./exportKeysToJCEKS.sh ProdCluster.keystore
Enter Password for the keystore FILE :
Enter Password for the KEY(s) stored in the keystore:
Keys from Ranger KMS Database has been successfully exported into ProdCluster.keystore
Now, the password protected keystore can be securely copied to the DR cluster node where the KMS Server runs: ProdCluster:~# scp ProdCluster.keystore DRCluster:/usr/hdp/current/ranger-kms/ Next, import the keys into the Ranger KMS database on the DR cluster. On the Ranger KMS node in the DR cluster, execute the following: DRCluster:~# cd /usr/hdp/current/ranger-kms
DRCluster:~# ./importJCEKSKeys.sh ProdCluster.keystore jceks
Enter Password for the keystore FILE :
Enter Password for the KEY(s) stored in the keystore:
Keys from ProdCluster.keystore has been successfully exported into RangerDB
The last step is to create the encryption zone on the DR cluster and specify which key to use for encryption: DRCluster:~$ hdfs crypto -createZone -keyName ProdKey1 -path /data/encrypted Now date can be copied using the /.reserved/raw/ virtual path to avoid the decryption/encryption steps between the clusters: ProdCluster~$ hadoop distcp -px hdfs://ProdCluster:8020/.reserved/raw/data/encrypted/file1.txt hdfs://DRCluster:8020/.reserved/raw/data/encrypted/ Please note that the key copy procedure will need to be repeated when new keys are created or when keys are rotated within the KMS.
... View more
05-17-2016
01:05 PM
8 Kudos
Virtual Memory swapping can have a large impact on the performance of a Hadoop system. Because of the memory requirements of YARN containers and processes running on the nodes in a cluster, swapping process out of memory to disk can cause serious performance limitations. As such, the historical recommendations for setting the swappiness, or propensity to swap out a process, on a Hadoop system has been to disable swap altogether. With newer versions of the Linux kernel, Out Of Memory (OOM) situations can be more likely to indiscriminately kill important processes to reclaim valuable physical memory on the system with a swappiness of 0. In order to prevent the system from swapping processes too frequently, but still allow for emergency swapping (instead of killing processes), the recommendation is now to set swappiness to 1 on Linux systems. This will still allow swapping, but with the least possible aggressiveness (for comparison, the default value for swappiness is 60). To change the swappiness on a running machine, use the following command: echo "1" > /proc/sys/vm/swappiness To ensure the swappiness is set appropriately on reboot, use the following command: echo "vm.swappiness=1" >> /etc/sysctl.conf
... View more
Labels: