Member since
07-30-2019
181
Posts
205
Kudos Received
51
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2870 | 10-19-2017 09:11 PM | |
959 | 12-27-2016 06:46 PM | |
683 | 09-01-2016 08:08 PM | |
719 | 08-29-2016 04:40 PM | |
1414 | 08-24-2016 02:26 PM |
05-22-2018
06:50 PM
@Hiroshi Shidara The installed packages are newer than the CentOS 7.4 packages being requested. Are you on CentOS 7.5 by chance? 7.5 is not supported, only up to 7.4. You’ll need to downgrade your nodes to CentOS 7.4 and try the installation again.
... View more
03-05-2018
03:58 PM
@rmr1989 Rahul Pathak gives a good description of this property and what it does in his response to this HCC question.
... View more
03-05-2018
03:53 PM
1 Kudo
@Simon Tang In the appendix of the tutorial, it points you to the Git repository for the Trucking Demo: https://github.com/orendain/trucking-iot/tree/hadoop-summit-2017 You can clone the Git and get all the pieces you need with this command: git clone https://github.com/orendain/trucking-iot.git
... View more
10-19-2017
09:11 PM
1 Kudo
@dsun During the upgrade process, a component is supposed to be restarted after the hdp-select command has been run so it will pick up the new binaries. However, the component needs to shut down and start up after the hdp-select command has been run. That way it will report to Ambari that it's version has changed and what it's current state is. In the event that you get stuck (as you did) during the upgrade you can unwind the versioning with a process like this: Make all pieces of the component are running Run `hdp-select set` command on all nodes in the cluster to set the new version. Make sure you get all of the pieces for the component (e.g. hadoop-hdfs-namenode, hadoop-ndfs-journalnode, etc.) Restart all processes for the component Verify that the O/S processes are running with the proper version of jar files Lather, rinse, and repeat for all components in the cluster Once you have successfully gotten everything restarted with the proper bits, you should be able to manually finalize the upgrade with the following command to the Ambari Server: ambari-server set-current --cluster=<custername> --version-display-name=HDP-2.6.2.0 If you get an error that components are not upgraded, you can check the components and hosts again. If everything seems ok, then you may need to tweak a table in the database. I ran into this when Atlas did not properly report the upgraded version to Ambari. NOTE: THIS SHOULD BE DONE WITH THE GUIDANCE OF HORTONWORKS SUPPORT ONLY ambari=> SELECT h.host_name, hcs.service_name, hcs.component_name, hcs.version FROM hostcomponentstate hcs JOIN hosts h ON hcs.host_id = h.host_id ORDER BY hcs.version, hcs.service_name, hcs.component_name, h.host_name;
host_name | service_name | component_name | version
----------------------------------+----------------+-------------------------+-------------
scregione1.field.hortonworks.com | ATLAS | ATLAS_CLIENT | 2.6.1.0-129
scregionm0.field.hortonworks.com | ATLAS | ATLAS_CLIENT | 2.6.1.0-129
scregionm1.field.hortonworks.com | ATLAS | ATLAS_CLIENT | 2.6.1.0-129
scregionm2.field.hortonworks.com | ATLAS | ATLAS_CLIENT | 2.6.1.0-129
scregionw0.field.hortonworks.com | ATLAS | ATLAS_CLIENT | 2.6.1.0-129
scregionw1.field.hortonworks.com | ATLAS | ATLAS_CLIENT | 2.6.1.0-129
scregionm0.field.hortonworks.com | ATLAS | ATLAS_SERVER | 2.6.1.0-129
scregionm1.field.hortonworks.com | DRUID | DRUID_BROKER | 2.6.2.0-205
scregionm1.field.hortonworks.com | DRUID | DRUID_COORDINATOR | 2.6.2.0-205
scregionw0.field.hortonworks.com | DRUID | DRUID_HISTORICAL | 2.6.2.0-205
scregionw1.field.hortonworks.com | DRUID | DRUID_HISTORICAL | 2.6.2.0-205
scregionw0.field.hortonworks.com | DRUID | DRUID_MIDDLEMANAGER | 2.6.2.0-205
scregionw1.field.hortonworks.com | DRUID | DRUID_MIDDLEMANAGER | 2.6.2.0-205
scregionm2.field.hortonworks.com | DRUID | DRUID_OVERLORD | 2.6.2.0-205
scregionm2.field.hortonworks.com | DRUID | DRUID_ROUTER | 2.6.2.0-205
scregionm2.field.hortonworks.com | DRUID | DRUID_SUPERSET | 2.6.2.0-205
scregione1.field.hortonworks.com | HBASE | HBASE_CLIENT | 2.6.2.0-205
scregionm0.field.hortonworks.com | HBASE | HBASE_CLIENT | 2.6.2.0-205
scregionm1.field.hortonworks.com | HBASE | HBASE_CLIENT | 2.6.2.0-205
. . . After verifying that you have, indeed, upgraded the components, a simple update command will set the proper version for the erroneous components and allow you to finalize the upgrade: ambari=> update hostcomponentstate set version='2.6.2.0-205' where component_name = 'ATLAS_CLIENT';
UPDATE 6
ambari=> update hostcomponentstate set version='2.6.2.0-205' where component_name = 'ATLAS_SERVER';
UPDATE 1
After cycling the Ambari Server, you should be able to finalize: [root@hostname ~]# ambari-server set-current --cluster=<cluster> --version-display-name=HDP-2.6.2.0
Using python /usr/bin/python
Setting current version...
Enter Ambari Admin login: <username>
Enter Ambari Admin password:
Current version successfully updated to HDP-2.6.2.0
Ambari Server 'set-current' completed successfully.
... View more
04-27-2017
09:08 PM
@Edgar Daeds Have you enabled HTTP auth (SPNEGO) for your cluster? If not, things like the Phoenix thin JDBC driver and Thrift servers will not be able to authenticate. Here's the docs on how to enable SPNEGO: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_security/content/ch_enable_spnego_auth_for_hadoop.html
... View more
04-19-2017
03:05 PM
1 Kudo
@Smart Solutions The sandbox uses Ambari to manage the configurations. Updating the files directly will only get overwritten by Ambari. If you want to add a value to the hdfs-site file, you can go to HDFS -> Configs -> Advanced -> Custom hdfs-site.xml and add a property. A better way to manage ACLs and access in HDP is to install Ranger (already installed on the Sandbox). You can assign privileges via a UI. It makes it much easier to manage. Of course, in a production environment you will want to enable Kerberos on your system to provide authentication facilities.
... View more
03-24-2017
01:51 PM
@Mustafa Kemal MAYUK Check out my article on securing a cluster with FreeIPA as the identity management system. Ambari 2.4 Kerberos with FreeIPA
... View more
02-16-2017
01:21 PM
@Saurav Ranjit From the database check log, it appears your PostgreSQL database is down. Use the systemctl command to check and restart the database server.
... View more
12-30-2016
02:44 PM
2 Kudos
@Indrajit swain How did you connect to the Sandbox? The 2.5 Sandbox uses docker. The SSH server for the docker container is on port 2222. You'll need to connect as follows: $ ssh root@localhost -p 2222
... View more
12-27-2016
06:46 PM
2 Kudos
You bring up a good point about impersonation attacks. If a user is able to authenticate as (or impersonate) a user, they can gain access to all kinds of data, keys, etc., that they shouldn’t. This is why it is very important to use a reliable authentication mechanism (e.g. Kerberos), require users to change passwords regularly, and use secure passwords. That reduces the chances that impersonation attacks will occur.
That being said, there are a couple of impersonation scenarios that merit discussion:
Superuser impersonation (hdfs) - If an HDFS superuser account is compromised, that superuser would have to also have permissions on the EZ Key via Ranger for the user to see any unencrypted data. By default, the hdfs user only gets Generate EEK and Get Metadata privileges on the keys in the KMS. That means that a user who impersonates the hdfs user still won’t be able to decrypt any file data (or even the EDEK stored in the file metadata on the NN). Valid user impersonation - If a valid user account is impersonated, there are multiple authorization checks that need to be passed for the user to gain access to file data. That user would have to have HDFS permissions on the directory/file within the EZ to read the file, and the user would need to have Get Metadata and Decrypt EEK permissions on the EZ key to decrypt the file. If both of those authorizations exist for the compromised user, the attacker would be able to decrypt files within the EZ, but would not have access to the EZ key, nor the KMS master key. All of the creation, encryption, and decryption of the DEKs is handled within the KMS. The user never sees the key that was used to encrypt the DEK (the EZ key). The user only sees the EDEK, or the DEK. To maintain the integrity of the DEK as it is passed to the user to encrypt or decrypt the file, it is HIGHLY recommended to enable SSL on the KMS with a certificate that is trusted by the DFSClient (well known CA, internal CA trusted by the host, etc.).
What becomes the most important to ensure the security of a system are the following:
Protect against user account compromise - Use secure passwords, rotate passwords regularly. Use Kerberos security that requires authentication via a reliable mechanism for a user. If Kerberos is not enabled, setting the HDFS_USER_NAME variable means you can impersonate anyone at any time. I’ve had customers say they want to secure their cluster without Kerberos. There is no such thing. Protect the KMS keystore - It is imperative that the KMS keystore is kept secure. The database used as the backing store must be secured, locked down, restricted, firewalled, monitored, audited, and guarded. Period. If the keystore can be compromised, then the EZ keys can be compromised and none of the data is secure. Secure network communication - The transmission of the keys between the KMS and the DFSClient needs to be secure. If it is not, then the DEK will be transmitted in the open when the DFSClient requests the unencrypted key. Rotating keys for an EZ helps to minimize the impact of a security breach. If a user gains access to the EZ somehow (most likely via a compromise of the KMS backing store or a brute force attack on the EDEK from the NN metadata), then rotating the keys regularly will minimize the exposure area (assuming a single key is compromised and not all of the keys). It is very expensive to rotate the key for all of the data in the EZ because data must be copies out of the EZ and then back into the EZ after the key is rotated to re-encrypt it and generate a new EDEK to store in the NN metadata.
... View more
10-14-2016
08:19 PM
12 Kudos
NiFi Identity Conversion
In a secure NiFi environment, the identity of a user can be determined in a number of ways depending on the authentication configuration. Machines also have an identity that needs to be determined upon authentication. Determining the identity of an entity is important to ensure proper authorization and access to resources. Machine Identity
The identity of the node in a NiFi cluster is determined by the SSL certificate that is used for secure communication with other nodes in the cluster. This certificate can be generated by the internal Certificate Authority provided with HDF, or by an external CA. Once SSL is enabled on the cluster using the certificates, they will be stored (by default) in the /etc/nifi/conf/keystore.jks keystore.
To get the node's identity as specified in the certificate, first get the keystore password from the nifi.properties file, then run the keytool command:
cat /etc/nifi/conf/nifi.properties | grep keystorePasswd
nifi.security.keystorePasswd=lF6e7sJsD3KxwNsrVqeXbYhGNu3QqTlhLmC5ztwlX/c
keytool -list -v -keystore /etc/nifi/conf/keystore.jks
This command will print out all of the information about the node's certificate. The Owner field contains the node's identity.
Alias name: nifi-key
Creation date: Oct 7, 2016
Entry type: PrivateKeyEntry
Certificate chain length: 2
Certificate[1]:
Owner: CN=nifi-2.example.com, CN=hosts, CN=accounts, DC=example, DC=com
Issuer: CN=nifi-1.example.com, OU=NIFI
Serial number: 157a059d1cb00000000
Valid from: Fri Oct 07 18:13:43 UTC 2016 until: Mon Oct 07 18:13:43 UTC 2019
Certificate fingerprints:
MD5: C2:BD:6A:CE:86:05:C9:C1:E8:DE:0C:C1:62:B5:27:5B
SHA1: 3A:BA:E4:35:DA:91:D2:DB:E3:A1:BA:C8:7F:19:C4:C2:BD:81:5A:8F
SHA256: 2A:4F:05:51:9E:4F:50:8B:0D:B0:4C:55:AD:21:65:CF:5D:C2:85:8B:BA:0F:CB:5A:95:AC:C4:3D:08:62:13:02
Signature algorithm name: SHA256withRSA
Version: 3
Extensions:
...
In the example above, the identity of the node (Owner of the certificate) is CN=nifi-2.example.com, CN=hosts, CN=accounts, DC=example, DC=com .
If the certificates are managed by the internal CA, the node identity is determined by two parameters in the NiFi configuration that convert the hostname into a distinguished name (DN) format:
The node idenity from the certificate above was generated using the parameters shown in the Ambari NiFi configuration. The NiFi CA uses the CA DN Prefix + hostname + CA DN Suffix to generate the Owner field stored in the certificate. It is important to note the transformation that occurs between the configuration parameteters.
Hostname: nifi-2.example.com
CA DN Prefix: CN=
CA DN Suffix: ,cn=hosts,cn=accounts,dc=example,dc=com
Is translated into a node identity of:
CN=nifi-2.example.com, CN=hosts, CN=accounts, DC=example, DC=com
The lowercase attribute identifiers (cn, dc, etc.) are converted to uppercase (CN, DC, etc.) and a space is added between each component of the distinguishted name. These transformations will become important later when identity conversions are created. User Identity
The user's identity can be determined in multiple ways depending on how security is configured within the cluster:
If certificate based user authentication is used, the user identity is determined from the certificate just as it is for node identity.
If LDAP authentication is used, the user identity is determined by the distinguised name attribute passed back from the LDAP server.
If Kerberos authentication is used, the user idnetity is determined based on the Kerberos principal
Certificate Based User Authentication
The user identity can be determined via SSL certificate in the same way that the node identity is. The same conversion for DN Prefix and DN Suffix occurs when generating user certificates using the SSL Toolkit, and the same methods for pulling the identity out of the certificate can be used.
LDAP Based User Authentication
If LDAP authentication is enabled, the LDAP server will pass back the distinguished name (DN) of the user entry in the directory. This value is used to determine the user identity. It may not be clear from the LDAP server configuration exactly how the DN will be formatted when it is passed back. For pattern matching and idnetity conversion, the case of the field names and spacing of the DN value will be important. To determine the format, a simple ldapsearch can be performed for a known username.
Windows Active Directory:
ldapsearch -W -h adserver.example.com -p 389 -D "cn=hadoopadmin,OU=ServiceUsers,dc=example,dc=com" -b "OU=ServiceUsers,dc=example,dc=com" sAMAccountName=hadoopadmin
OpenLDAP/FreeIPA:
ldapsearch -W -h ldapserver.example.com -p 389 -D "uid=hadoopadmin,cn=users,cn=accounts,dc=example,dc=com" uid=hadoopadmin
In the output, find the dn field for the user:
Windows Active Directory:
dn: CN=hadoopadmin,OU=ServiceUsers,DC=example,DC=com
OpenLDAP/FreeIPA:
dn: uid=hadoopadmin,cn=users,cn=accounts,dc=example,dc=com
Note the case and the spacing of the returned value for later configuration steps.
Kerberos Based User Authentication
When Kerberos authentication is used, the identity of the user is determined from the Kerberos principal. The principal takes a form of username@REALM . For example:
hadoopadmin@EXAMPLE.COM
The realm is (by convention) the domain in uppercase. Identity Conversion
NiFi uses the identity that it determines from the various authentication mechanisms during authorization procedures. In an HDP cluster, authorization is provided by Apache Ranger. Ranger syncs usernames from Active Directory or LDAP, but it does not sync them in the distinguished name format that is returned during authentication against these mechanisms. Likewise, the Kerberos principal format is not typically used in Ranger. As such, the interesting portion of the DN or principal style identity must be parsed out for use with Ranger.
NiFi provides a mechanism for transforming the certificate, LDAP, or Kerberos based identity. This is done via pairings of configuration parameters of the form:
nifi.security.identity.mapping.pattern.<unique>
nifi.security.identity.mapping.value.<unique>
The <unique> portion is replaced with a unique string identifying the purpose of the transformation. There are two pairings created by default ( <unique>=dn, and <unique>=kerb ), but other pairings can be created as needed. For the pattern portion of the pairing, Regular Expression syntax is used to parse the original identity into components. The value portion of the pairing uses these parsed components in variable substition format to build the translated version of the idenity. A few important operators for the translation are:
^ - Denotes the beginning of the value
$ - Denotes the end of the value
() - Assigns matched strings to a variable. Variable names start with 1 and increment for each time used in the Regular Expression
. - Matches any character
* - Matches 0 or more of the preceding character
? - Matches exactly one of any character
Using these operators, it is possible to separate any of the identities discussed so far into their components. Using the dn pairing of configuration parameters, separating the DN returned by LDAP into just the username can be accomplished with the following.
Windows Active Directory:
nifi.security.identity.mapping.pattern.dn = ^CN=(.*?),OU=ServiceUsers.*$
nifi.security.identity.mapping.value.dn = $1
OpenLDAP/FreeIPA:
nifi.security.identity.mapping.pattern.dn = ^uid=(.*?),cn=users.*$
nifi.security.identity.mapping.value.dn = $1
If there is a need to use additional components of the DN for the user identity, the DN can be split into additional variables
nifi.security.identity.mapping.pattern.dn = ^CN=(.*?),OU=(.*?),DC=(.*?),DC=(.*?)$
The full list of variables created by the pattern variable in this example is:
$1 = hadoopadmin
$2 = ServiceUsers
$3 = example
$4 = com
To convert the host identity from SSL certificates (and user identities from internal CA generated user certificates), use an identity mapping pairing such as:
nifi.security.identity.mapping.pattern.host = ^CN=(.*?), CN=hosts.*$
nifi.security.identity.mapping.value.host = $1
In this example, note the space in , CN= and the case of the CN . These are becaue of the conversion that the CA performs when generating the SSL certificate as described aboe.
If Kerberos is enabled on the NiFi cluster, the Kerberos principal can be converted to a username in the following way:
nifi.security.identity.mapping.pattern.kerb = ^(.*?)@(.*?)$
nifi.security.identity.mapping.value.kerb = $1
Conclusion
Identity determination in a NiFi cluster can be a complex topic, but thankfully, NiFi provides a powerful mechanism for parsing identities into a common format understandable by the Ranger authorization mechanisms. Identity mapping pairings should exist for all methods of identity mapping that will be needed in the NiFi cluster. An identity to be mapped should only match a single set of mapping rules to ensure reliable mapping of identities. The default pair of mappings ( dn and kerb ) are defined in the Advanced nifi-properties section of the Ambari NiFi configuration. Additional pairings can be added to the Custom nifi-properties section of the Ambari NiFi configuration.
... View more
- Find more articles tagged with:
- active-directory
- How-ToTutorial
- Kerberos
- LDAP
- NiFi
- Security
Labels:
10-08-2016
03:30 AM
1 Kudo
@hu bai Contrary to popular belief, it is not necessary to enable Kerberos to use the Ranger plugin. Kerberos is for authentication, while Ranger does authorization. You can use other authentication techniques to identify the user, if you choose. Kerberos is a very secure and reliable way of authenticating a user, and that is why is is frequently used. However, you can use Unix auth or LDAP authentication in your cluster to identify the user. The username which submits the Storm topology is the one used for authorization with Ranger. Ranger will then use its policy information to determine what the user is allowed to do.
... View more
10-08-2016
03:17 AM
1 Kudo
@Houssam Manik As @slachterman says, the LDAP attributes that map to a user's username, group membership, etc., are configurable. The reason for this is because an administrator can modify the directory schema, or the schema may have evolved over time. For Active Directory 2012, the default values you'll want to user are: User Object Type: person
Username Attribute: sAMAccountName
Use Group Name Attribute: sAMAccountName
Group Member Attribute: member
Group Name Attribute: sAMAccountName
Group Object Class: group For FreeIPA, these change to: User Object Class: posixaccount
Username Attribute: uid
Use Group Name Attribute: memberOf
Group Member Attribute: member
Group Name Attribute: cn
Group Object Class: posixgroup The base of the directory where Ranger starts to look for users and groups are specified by the User Search Base and Group Search Base parameters. For AD, you'd want to use something like: User Search Base: CN=Users,DC=example,DC=com
Group Search Gase: CN=Groups,DC=example,DC=com And for FreeIPA, something similar to: User Search Base: cn=users,cn=accounts,dc=example,dc=com
Group Search Gase: cn=groups,cn=accounts,dc=example,dc=com You can also specify search filters with syntax similar to: (|(memberOf=hadoop-admins)(memberOf=hadoop-users)) Here is a guide to LDAP Search Filters for more information.
... View more
10-08-2016
02:13 AM
@Jasper The first thing I see that is probably incorrect is your UTL for the main.ldapRealm.contextFactory.url. You specify port 88 of the URL. LDAP listens on port 389 by default (636 for LDAPS). Change: <value>ldap://xxxxxxxxxxxxx:88</value> to <value>ldap://xxxxxxxxxxxxx:389</value> That should get you along the way.
... View more
10-04-2016
01:06 AM
16 Kudos
Ambari 2.4 Kerberos with FreeIPA This tutorial describes how to enable Kerberos using a FreeIPA server for LDAP and KDC functions on HDP 2.5. The following assumptions are made:
An existing HDP 2.5 cluster No existing IPA server There are sufficient resources to create an m3.medium VM to house the FreeIPA server DNS is already taken care of in the environment FreeIPA will run on RHEL/CentOS 7 Step 1: Setup FreeIPA Server Install Entropy Tools
Certain operations like generating encryption keys host entropy for creating random data. A fresh system with no processes running and no real device drivers can have issues generating enough random data for these types of operations. Install the rng-tools package and start rngd to help with this issue: yum -y install rng-tools
systemctl start rngd
systemctl enable rngd Install FreeIPA Server Install NTP and the FreeIPA software and start the NTP service: yum -y install ntp ipa-server ipa-server-dns
systemctl enable ntpd
systemctl start ntpd In order to use FreeIPA for domain resolution within the cluster, there are a few pieces of information that need to be collected:
DNS servers for external lookups. These will be configured as "forwarders" in FreeIPA for handing off DNS resolution for external lookups. Reverse DNS Zone name. This is used for configuring reverse DNS lookups within FreeIPA. The FreeIPA server will calculate this based on the IP address and Netmask of the server if it is unknown. DNS domain to use for the cluster Kerberos realm to use for the cluster (by convention, usually the domain in uppercase) The hostname of the FreeIPA server The IP address to use for the FreeIPA server (if there is more than one on the host). ipa-server-install --domain=example.domain.com \
--realm=EXAMPLE.DOMAIN.COM \
--hostname=ipaserver.example.domain.com \
--ip-address=1.2.3.4
--setup-dns \
--forwarder=8.8.8.8 \
--forwarder=8.8.8.4 \
--reverse-zone=3.2.1.in-addr.arpa.
Enable PTR Record Sync
In order for reverse DNS lookups to work, enable PTR record sync on the FreeIPA server. Get a list of the DNS zones created: ipa dnszone-find --all | grep "Zone name" For each of the DNS zones, enable PTR sync: ipa dnszone-mod $zonename --allow-sync-ptr=true Configure krb5.conf Credential Cache HDP does not support the in-memory keyring storage of the Kerberos credential cache. Edit the /etc/krb5.conf file and change: default_ccache_name = KEYRING:persistent:%{uid} to default_ccache_name = FILE:/tmp/krb5cc_%{uid} Create a hadoopadmin user In order to create users in FreeIPA, an administrative use is required. The default admin@REALM user can be used (password created during IPA server install). Alternatively, create a hadoopadmin user: kinit admin@EXAMPLE.DOMAIN.COM
ipa user-add hadoopadmin --first=Hadoop --last=Admin
ipa group-add-member admins --users=hadoopadmin
ipa passwd hadoopadmin Ambari also requires a group to be created called ambari-managed-principals. This group is not currently created by the Ambari Kerberos wizard. Create the group: ipa group-add ambari-managed-principals Because of the way FreeIPA automatically expires the new password, it is necessary to kinit as hadoopadmin and change the initial password. The password can be set to the same password unless the password policy prohibits password reuse: kinit hadoopadmin@FIELD.HORTONWORKS.COM Step 2: Prepare the HDP Nodes First, disable the chronyd service since it interferes with NTP (which FreeIPA prefers): systemctl stop chronyd
systemctl disable chronyd Configure the HDP nodes to use the FreeIPA server for DNS resolution: echo "nameserver $ipaserver_ip_address" > /etc/resolv.conf All nodes in the HDP cluster must have the ipa-client software installed and be joined to the FreeIPA server: yum -y install ipa-client
ipa-client-install --domain=example.domain.com \
--server=ipaserver.example.domain.com \
--realm=EXAMPLE.DOMAIN.COM \
--principal=hadoopadmin@EXAMPLE.DOMAIN.COM \
--enable-dns-updates On the Amberi server node, install the ipa-admintools package: yum -y install ipa-admintools Step 3: Enable Experimental FreeIPA Support Support for FreeIPA is not enabled by default in Ambari. You must enable the experimental functionality in Ambari before you can select FreeIPA as an option in the Kerberos wizard. In a browser, navigate to: http://ambariserver.example.domain.com:8080/#/experimental Check the box next to enableipa: Step 4: Run the Kerberos Wizard Run the Kerberos wizard from Ambari (Admin -> Kerberos -> Enable Kerberos). Select "Existing IPA" and verify that the prerequisites have been met. Enter the appropriate information into the KDC page: Click through to the Configure Identities page of the wizard. There is a bug in the name of the Spark principal that needs to be corrected. FreeIPA requires principal names to be in lower case, but ambari allows the cluster name to be in mixed case. If the cluster contains capital letters, the creation of the Spark principal will fail. To account for this, the principal names should all contain a reference to the toLower() function in the cluster name variable to ensure that capital letters are corrected before creating the principal. Change the spark.history.kerberos.principal parameter to include the toLower() function: Change from: ${spark-env/spark_user}-${cluster_name}@${realm} To: ${spark-env/spark_user}-${cluster_name|toLower()}@${realm} The rest of the Wizard should complete successfully.
... View more
- Find more articles tagged with:
- freeipa
- How-ToTutorial
- ipa
- Kerberos
- Security
10-03-2016
10:18 PM
17 Kudos
One Way Trust - MIT KDC to Active Directory Many security environments have strict policies on allowing administrative access to Active Directory. Some performance issues can also require that Hadoop cluster principals for Kerberos are not created directly in AD. To aid in these situations, it may be preferable to use a local MIT KDC in the Hadoop cluster to manage service principals while using a one-way trust to allow AD users to utilze the Hadoop environment. This tutorial describes the steps necessary to create such a trust. The following assumptions are made for this tutorial: An existing HDP cluster Cluster has Kerberos enabled with an MIT KDC The MIT KDC realm name is HDP.HORTONWORKS.COM The MIT KDC server is named kdc-server.hdp.hortonworks.com The AD domain/realm is AD.HORTONWORKS.COM Step 1: Configure the Trust in Active Directory Create a KDC definition in Active Directory
On the AD server, run a command window with Administrator privileges and create a definition for the KDC of the MIT realm: ksetup /addkdc HDP.HORTONWORKS.COM kdc-server.hdp.hortonworks.com
Create the Trust in Active Directory
On the AD server, create an entry for the one-way trust. The password used here will be used later in the MIT KDC configuration of the trust: netdom trust HDP.HORTONWORKS.COM /Domain:AD.HORTONWORKS.COM /add /realm /passwordt:BadPass#1 Step 2: Configure Encryption Types In order for the MIT realm to trust tickets generated by the AD KDC, the encryption types between both KDCs must be compatible. This means that there must be at least one encryption type that is accepted by both the AD server as well as the MIT KDC server. Specify Encryption Types in Active Directory
On the AD server, specify which encryption types are acceptible for communication with the MIT realm. Multiple supported encryption types are specified on the command line separated by spaces: ksetup /SetEncTypeAttr HDP.HORTONWORKS.COM AES256-CTS-HMAC-SHA1-96 AES128-CTS-HMAC-SHA1-96 RC4-HMAC-MD5 DES-CBC-MD5 DES-CBC-CRC
Specify Encryption Types in MIT KDC
By default, all of the encryption types are accepted by the MIT KDC. If security concerns require that the encryption types be limited, this is done in the /etc/krb5.conf file: [libdefaults]
permitted_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96 arcfour-hmac-md5 des-cbc-crc des-cbc-md5 Step 3: Enable Trust in MIT KDC To complete the trust configuration, the trust must be added to the MIT KDC. Add Domain to MIT KDC Configuration
In the /etc/krb5.conf file, add the AD domain to the [realms] section: [realms]
HDP.HORTONWORKS.COM = {
kdc = kdc-server.hortonworks.com
admin_server = kdc-server.hortonworks.com
default_domain = hdp.hortonworks.com
}
AD.HORTONWORKS.COM = {
kdc = ad-server.hortonworks.com
admin_server = ad-server.hortonworks.com
default_domain = ad.hortonworks.com}
Create Trust User
In order for the trust to work, a principal combining the realms in the trust must be created in the MIT KDC. The password for this user must be the same as the password used to create the trust on the AD server: kinit admin/admin@HDP.HORTONWORKS.COM
kadmin -q "addprinc krbtgt/HDP.HORTONWORKS.COM@AD.HORTONWORKS.COM" Step 4: Configure AUTH_TO_LOCAL The Hadoop auth_to_local parameter must be changed to properly convert user principals from the AD domain to usable usernames in the Hadoop cluster. In Ambari, add the following rules to the auth_to_local variable in HDFS -> Configs -> Advanced -> Advanced core-site.xml -> hadoop.security.auth_to_local RULE:[1:$1@$0](^.*@AD\.HORTONWORKS\.COM$)s/^(.*)@AD\.HORTONWORKS\.COM$/$1/g
RULE:[2:$1@$0](^.*@AD\.HORTONWORKS\.COM$)s/^(.*)@AD\.HORTONWORKS\.COM$/$1/g
... View more
- Find more articles tagged with:
- active-directory
- How-ToTutorial
- Kerberos
- Security
09-06-2016
06:03 PM
@Ryan Hanson The obvious issue is the circular symlink references. Have you created symlinks prior to running the installer?
... View more
09-01-2016
08:08 PM
2 Kudos
@Siva Nagisetty The Data Governance documentation contains references to setting up Governance with Apache Atlas for various components including Kafka.
... View more
08-30-2016
07:01 PM
1 Kudo
@mkataria In order to do superuser commands (like enter safe mode, balance cluster, etc.), you have to run the command as the user that started the NameNode process. If the NameNode is running as the hdfs user, then you will need to issue these commands as the hdfs user: sudo -u hdfs hdfs balancer -threshold 5
... View more
08-29-2016
04:40 PM
@Eyad Garelnabi According to the Hadoop Documentation, permissions checks for the superuser always succeed, even if you try to restrict them. The process (and group) used to start the namenode become the superuser and can always do everything within HDFS.
... View more
08-24-2016
04:30 PM
1 Kudo
@Sami Ahmad The following line seems to indicate the issue: Caused by: java.io.IOException: Check-sum mismatch between hdfs://hadoop1.tolls.dot.state.fl.us:8020/user/sami/error1.log and hdfs://hadoop1.tolls.dot.state.fl.us:8020/user/zhang/.distcp.tmp.attempt_1472051594557_0001_m_000001_0. Source and target differ in block-size. Use -pb to preserve block-sizes during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. (NOTE: By skipping checksums, one runs the risk of masking data-corruption during file-transfer.)
Is the block size set differently between the source and target clusters?
... View more
08-24-2016
02:26 PM
1 Kudo
@da li The answers here are close, but not quite. The proxy user settings take the form of hadoop.proxyuser.<username>.[groups|hosts]. So, in your Custom hdfs-site.xml section of Ambari, add the following two parameters: hadoop.proxyuser.root.hosts=*
hadoop.proxyuser.root.groups=* This will correct the impersonation error.
... View more
08-23-2016
07:10 PM
2 Kudos
@mqadri FreeIPA does not currently support Multi-tenancy. There was an article written with regards to what was required in V3 to support this, but it has not been implemented as of 2015. The Request for Enhancement has been open for 4 years or so, but development has been in the direction of IPA to IPA trusts (at least as of Feb 2015). The version of IPA included with RHEL/CentOS 6 is 3.0.0: [root@sandbox resources]# yum info ipa-server
Loaded plugins: fastestmirror, priorities
Loading mirror speeds from cached hostfile
* base: mirror.team-cymru.org
* epel: mirrors.mit.edu
* extras: ftp.usf.edu
* updates: dallas.tx.mirror.xygenhosting.com
Available Packages
Name : ipa-server
Arch : x86_64
Version : 3.0.0
Release : 50.el6.centos.1
Size : 1.1 M
Repo : base
Summary : The IPA authentication server
URL : http://www.freeipa.org/
License : GPLv3+
Description : IPA is an integrated solution to provide centrally managed Identity (machine,
: user, virtual machines, groups, authentication credentials), Policy
: (configuration settings, access control information) and Audit (events,
: logs, analysis thereof). If you are installing an IPA server you need
: to install this package (in other words, most people should NOT install
: this package).
The version included with RHEL/CentOS 7 is version 4.2, but it still does not seem to support multi-tenancy per the above links.
... View more
08-23-2016
04:56 PM
3 Kudos
@Vincent Romeo The hive.metastore.heapsize is not a parameter that is in a file like hive-site.xml. This value is used by Ambari for substitution into the hive-env template file. You can see this section in the text box in Ambari: if [ "$SERVICE" = "metastore" ]; then
export HADOOP_HEAPSIZE={{hive_metastore_heapsize}} # Setting for HiveMetastore
else
export HADOOP_HEAPSIZE={{hive_heapsize}} # Setting for HiveServer2 and Client
fi
The {{hive_metastore_heapsize}} is where the substitution is made.
... View more
08-18-2016
09:01 PM
2 Kudos
@ripunjay godhani No, it is not possible to modify the install locations. These locations are specified at the time the RPMs are built and can not be changed. 3rd party software will depend on HDP being installed in this location, and Ambari distributes all of the config files to /etc on all of the nodes. Log file directories can be changed, but not the binary installation and config file directories.
... View more
08-18-2016
08:49 PM
3 Kudos
@Kumar Veerappan You should be able to read the /etc/ambari-agent/conf/ambari-agent.ini file on any node in the cluster. You will find a [server] section that will tell you where the Ambari server is: [server]
hostname = ambari-server.example.com
url_port = 8440
secured_url_port = 8441
... View more
08-18-2016
04:51 AM
10 Kudos
Since version 2.6, Apache Hadoop has had the ability to encrypt files that are written to special directories called encryption zones. In order for this at-rest encryption to work, encryption keys need to be managed by a Key Management Service (KMS). Apache Ranger 0.5 provided a scalable, open source KMS to provide key management for the Hadoop ecosystem. These features have made it easier to implement business and mission critical applications on Hadoop where security is a concern. These business/mission critical applications have also brought with them the need for fault tolerance and disaster recovery. Using Apache Falcon, it is easy to configure the copying of data from the Production Hadoop cluster to an off-site Disaster Recover (DR) cluster. But what is the best way to handle the encrypted data? Decrypting/encrypting the data to transfer it can hinder performance, but how do you decrypt data on the DR site without the proper keys from the KMS? In this article, we will investigate 3 different scenarios for managing the encryption keys between two clusters when Ranger KMS is used as the key management infrastructure. Scenario 1 - Completely Separate KMS Instances The first scenario is the case where the Prod cluster has a Ranger KMS instance, and the KR cluster has a Ranger KMS instance. Each is completely separate with no copying of keys. This configuration has some advantage from a security perspective. Since there are two distinct KMS instances, the keys generated for encryption will be different even for the same directory within HDFS. This can provide a certain level of protection should the Production KMS instance be compromised, however, the tradeoff is in the performance of the data copy. To copy the data in this type of environment, use the DistCp command similarly to how you would in a non-encrypted environment. DistCp will take care of the decrypt/encrypt functions automatically: ProdCluster:~$ hadoop distcp -update hdfs://ProdCluster:8020/data/encrypted/file1.txt hdfs://DRCluster:8020/data/encrypted/ Scenario 2 - Two KMS instances, one database In this configuration, the Prod and DR clusters each have a separate KMS Server, but the KMS Servers are both configured to use the same database to store the keys. On the Prod cluster, configure the Ranger KMS per the Hadoop Security Guide. Once the KMS database is set up, copy the database configuration to the DR cluster's Ambari config tab. Make sure to turn off the "Setup Database and Database User" option at the bottom of the config page: Once the KMS instances are both set up and working, creation of the encryption keys in this environment is simpler. Create the encryption key on the Prod cluster using either the Ranger KMS UI (login to Ranger as keyadmin), or via the CLI: ProdCluster:~$ hadoop key create ProdKey1 Specify which key to use to encrypt the data directory. On the Prod cluster: ProdCluster:~$ hdfs crypto -createZone -keyName ProdKey1 -path /data/encrypted On the DR cluster, use the exact same command (even though it is for the DR cluster): DRCluster:~$ hdfs crypto -createZone -keyName ProdKey1 -path /data/encrypted Since both KMS instances use the same keys, the data can be copied using the /.reserved/raw virtual path to avoid decrypting/encrypting the data in transit. Note that it is important to use the -px flag on distcp to ensure that the EDEK (which is saved as an extended attribute) are transferred intact: ProdCluster~$ hadoop distcp -px hdfs://ProdCluster:8020/.reserved/raw/data/encrypted/file1.txt hdfs://DRCluster:8020/.reserved/raw/data/encrypted/
Scenario 3 - Two KMS instances, two databases In this configuration, the Prod and DR clusters each have a separate KMS Server, and each has it's own database store. In this scenario it is necessary to copy the keys from the Prod KMS database to the DR KMS database. The Prod and DR KMS instances are setup separately per the Hadoop Security Guide. The keys for the encryption zones are created on the Prod cluster (the same as Scenario 2): ProdCluster:~$ hadoop key create ProdKey1 Specify which key to use to encrypt the data directory on the Prod cluster: ProdCluster:~$ hdfs crypto -createZone -keyName ProdKey1 -path /data/encrypted
Once the keys are created on the Prod cluster, a script is used to export the keys so they can be copied to the DR cluster. On the node where the KMS Server runs, execute the following: ProdCluster:~# cd /usr/hdp/current/ranger-kms
ProdCluster:~# ./exportKeysToJCEKS.sh ProdCluster.keystore
Enter Password for the keystore FILE :
Enter Password for the KEY(s) stored in the keystore:
Keys from Ranger KMS Database has been successfully exported into ProdCluster.keystore
Now, the password protected keystore can be securely copied to the DR cluster node where the KMS Server runs: ProdCluster:~# scp ProdCluster.keystore DRCluster:/usr/hdp/current/ranger-kms/ Next, import the keys into the Ranger KMS database on the DR cluster. On the Ranger KMS node in the DR cluster, execute the following: DRCluster:~# cd /usr/hdp/current/ranger-kms
DRCluster:~# ./importJCEKSKeys.sh ProdCluster.keystore jceks
Enter Password for the keystore FILE :
Enter Password for the KEY(s) stored in the keystore:
Keys from ProdCluster.keystore has been successfully exported into RangerDB
The last step is to create the encryption zone on the DR cluster and specify which key to use for encryption: DRCluster:~$ hdfs crypto -createZone -keyName ProdKey1 -path /data/encrypted Now date can be copied using the /.reserved/raw/ virtual path to avoid the decryption/encryption steps between the clusters: ProdCluster~$ hadoop distcp -px hdfs://ProdCluster:8020/.reserved/raw/data/encrypted/file1.txt hdfs://DRCluster:8020/.reserved/raw/data/encrypted/ Please note that the key copy procedure will need to be repeated when new keys are created or when keys are rotated within the KMS.
... View more
- Find more articles tagged with:
- Distcp
- Encryption
- How-ToTutorial
- ranger-kms
- Security
08-17-2016
05:22 PM
4 Kudos
@Randy Gelhausen You can set the "Remote Owner" attribute to the user you want to own the files in HDFS. You can set "Remote Group" as well. Both of these are at the processor level and do not support Expression Language, so you'd have to set them for the processor. You could use a RouteOnAttribute processor to determine which user should own the files in HDFS and route the flow to the proper PutHDFS processor, but this will be more cumbersome than distributing keytabs to the users. In a secure environment, the users would likely need to have their keytab to write to HDFS anyway since you'd have to authenticate somehow and there's not a way presently to pass a Kerberos ticket to NiFi.
... View more
08-17-2016
04:16 PM
2 Kudos
@Smart Solutions As @Michael Young stated, Zeppelin in 2.4 is TP. Zeppelin goes GA with the upcoming HDP 2.5 release (due out soon) and includes Kerberos integration. A full list of features is not available yet (as it's still in the hands of the devs), but should be available soon.
... View more
08-15-2016
02:43 PM
@jovan karamacoski You can enable multiple tiers of storage and specify where files should be stored to control. Check out the following link: http://hortonworks.com/blog/heterogeneous-storages-hdfs/ If you really need to control which nodes that data goes to as well, you can only set up the faster storage on the faster nodes. This is not recommended because it will lead to an imbalance on the cluster, but it is possible. to do.
... View more