Member since
09-15-2015
75
Posts
33
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1413 | 02-22-2016 09:32 PM | |
2271 | 12-11-2015 03:27 AM | |
8405 | 10-26-2015 10:16 PM | |
7472 | 10-15-2015 06:09 PM |
12-08-2015
01:07 AM
5 Kudos
Q: What are the use cases for Centrify? A: Integrate AD with Linux and there are no local user deployment similar to how SSSD is configured. The size of the cluster and/or domains is big enough that it's hard to manage with SSSD. Centrify greatly simplifies the management of this type of environment. Centrify ldapproxy also abstracts the complexity around integrating with other filers/appliances like Isilon, NetApp, etc. ldapproxy fronts Hadoop and doesn't expose internals of AD but only give information about a zone. It is usually used for machine to machine type authentication. Q: If there are multiple domains in a forest, how does Centrify know which domain controller to use to authenticate a user? A: Centrify walk the forest tree and figure out what domain controller to use to authenticate the user. It utilizes Domain Controller Service to perform this action. It doesn't use krb5.conf file. The Centrify agent knows what forest or domain controller it belongs to. It is PAM and site aware and its base authentication mechanism is Kerberos. It builds an index of Domain Controllers and DNS Servers and tag them based on the response time. Based on this information, the agent will know if a particular DC or DNS server has issues and will not use them. DNS must be setup properly , including reverse lookup for Centrify to work. The agents supports authenticating the same user that exists across different domains. Q: What happens when Centrify agents fail? A: Centrify does not store AD information and there's no such thing as policy server. It completely leverages the AD infrastructure to scale out. Centrify DirectControl (CDC) watches all Centrify agents and restarts them when they fail. See below diagram for reference. If all else fails, Centrify can fallback to NTLM if it needs to. For example some users don't have Kerberos enabled on their laptops due to inherent issues and has to resort to NTLM. Q: What's the best practice for laying out the Centrify policies? A: The basic building block of Centrify policies is zone. A zone is how Centrify organizes the data inside of AD. Zone is a unit of cluster. The data is essentially user information, unix group information, unix computer information, role-based access control and many more. The reason why this is done is because of service connection point. Service Connection Point is a multi-diag object that's been available since Windows 2003 and Centrify link that back to the real AD object. This provides flexibility on naming conventions for zones, what objects to link it to in AD. The Service Connection point can be seen from "Active Directory Users and Computers" window as shown below. These service points are what PAM will use to authenticate users.p The image below describes best practice layout of creating policies in Centrify. All users are defined under Zones->UNIX Data->Users. Remember, all users and groups are created in AD. What shows up here are just pointers to the AD user objects. These users will eventually be inherited to the Child Zones. The Hadoop cluster is the boundary for Centrify policies. No Hadoop node should belong to multiple zones. The only exception here is when an RDBMS is used for Hadoop components that would need it i.e. Ambari, Oozie, Hive. Centrify agents supports multiple domains where same user exists across domains. Hadoop jobs pick up the real AD user. It is best to name the child zones in lower case and must match the Hadoop cluster name. In the sample policy above, "smesecurity" is the name of the child zone and it's also the name of the Hadoop cluster with case matching. Only the nodes within this cluster should exist in Zones->Global->Child Zones->Computers. The global users are not automatically pushed down to child zones. It has to be explicitly added. For users to successfully login to linux machines, they have to have a complete profile - UID, GID, and a Role Assignment. Role Assignment grants the access. There will be users that exists on the child zones that don't exist on the parent zone. These are normally the service accounts that lives only on the child zone. The OU structure has to lineup with how the zone is structured. This is the best practice. It is possible to redefine the same user in the child zone with different properties basically overriding what's defined globally for that user. For large cluster installations, it's easier to use VPA, part of DirectControl component, that automates the creation of user profile in Centrify by just dropping the user into AD Groups. This is done through PowerShell or Linux/Unix command interface. All the policy information entered in Centrify are stored in AD. See below. The green box shows everything that was defined in Centrify including the "smesecurity" child zone. This is also replicated across active directories for redundancy purposes. Q: Centrify creates service principals for nfs and http. Will this create issues with Kerberizing HDP? A: Yes. Centrify has its own Kerberos module for nfs and http. When Kerberizing clusters with Ambari, it automatically generates principals for nfs and http services and this clashes with Centrify. To prevent issues, update the file /etc/centrifydc/centrifydc.conf on all machines and look for the property adclient.krb5.service.principals. Remove "nfs" and "http" entries. It should look like this. adclient.krb5.service.principals: ftp cifs If for some reason, the nfs and http entries were not removed and Kerberos wizard in Ambari was run, NFS Gateway, DataNode and other components that depends on http will fail. To resolve this, update all the centrifydc.com and remove nfs and http as described above. Also remove the http and nfs SPNs from AD. Then on all machines, run the following commands. # adreload
# service centrifydc restart Q: Centrify ldapproxy won't start using TLS. Certificate cannot be found. A: Common issues with ldapproxy not starting up successfully is normally caused by certificate names and casing not matching between AD and Centrify. Check the certificates in /var/centrify/net/certs/ if the certificate names matches. Make sure that the file vi /etc/centrifydc/openldap/slapd.conf has entries for the centrify certificate. See sample below. # Centrify specific
TLSCACertificateFile /var/centrify/net/certs/auto_ComputerForLdaps_CA.pem
TLSCertificateFile /var/centrify/net/certs/auto_ComputerForLdaps.cert
TLSCertificateKeyFile /var/centrify/net/certs/auto_ComputerForLdaps.key
Q: How does Centrify computer roles play into Hadoop clusters? A: Computer roles allows you to define a set of rights to a logical group of computers. Ambari, Oozie and Hive Metastore all uses RDBMS systems and there's growing trend that organizations prefer to use Oracle and SQL Server. For example an Oracle Admin and Oracle Server(s) are defined in computer roles and the admin rights are applied to these servers regardless of location. These servers can be used by multiple Hadoop clusters. The provisioning of computer role assignments can be done at the zone level or at the node level. There's this concept of delegating zone control from within a zone, computers and users, that can be used to specify what group have admin rights to it (not root rights but AD rights - see image below). Q: When a new AD is added to the forest, how does Centrify pick it up? A: There are configurations that allows Centrify agents to automatically walk the tree of AD domains and discover new AD servers within the forest. The discovery process is time based and can be changed. The agents also keeps track of what AD controller is up or down. There are PTR records in the AD DNS Manager as shown below that is used by Centrify agents to discover Domain Controllers and Global Catalog servers. Q: Linux servers have their own DNS services and AD has its own built-in directory services. It's a painful process to point the Linux servers to AD and build PTR records for them. How does Centrify make this more seamless? A: Centrify supports integrating with two different DNS environments (i.e. hortonworks.net and hortonworks.com) through a feature called "alias". Though possible and supported, it is not recommended to setup Centrify and Hadoop to deal with this type of configuration. Q: What's the behavior of Centrify when a user logs in to machines using ssh? A: If the user provided a password to login, kerberos ticket will be automatically generated. If ssh key is used, it will not automatically generate the ticket. User has to kinit. When forwardable tickets are turned on in windows kerberos systems, the user does not have to kinit again. Q: How does Centrify sync with latest AD changes? A: Centrify has a utility called adflush to pull down the changes from AD. It could be an expensive process depending on what information is being pulled down. adflush will be a perfect tool for developers in POC mode. Q: How can I blacklist users in Centrify? A: You can enter the users that you want to block in this file /etc/centrifydc/users.ignore. Q: How do you safely snapshot Centrify? A: If you snapshot machines with Centrify agents and roll back to the latest version and the keytab file changed, the machines won't be able to authenticate with AD. Make sure that when snapshots are running that keytabs are the same when rolling back to a specific version. Q: With a very large cluster (in the thousands of nodes), how do you scale with Centrify and AD? A: It is recommended to deploy the Domain Controller in the same rack space as the Hadoop nodes. You want your AD to be replicated. Hadoop will hammer AD with requests and you want to make sure that AD can handle it. Centrify is agent based so no issues with scaling. The agents know which domain controllers to go to and which one they can connect to faster.
... View more
11-20-2015
09:54 PM
2 Kudos
Audit Logs stored in Ranger Audit DB needs to be piped to SIEM system. Need to know what table(s) I can query on to pull failed policies (ie. "Denied" access). This information will eventually be pushed to SIEM.
... View more
Labels:
- Labels:
-
Apache Ranger
11-05-2015
06:57 PM
Followed the recommended AH link from @Neeraj. See details below. Logged in to postgresql: ambari=> select * from hostcomponentstate where component_name LIKE '%RANGER%'; id | cluster_id | component_name | version | current_stack_id | current_state | host_id | service_name | upgrade_state | security_state-----+------------+-------------------+--------------+------------------+----------------+---------+--------------+---------------+---------------- 163 | 2 | RANGER_KMS_SERVER | 2.3.2.0-2950 | 4 | INSTALLED | 5 | RANGER_KMS | NONE | UNKNOWN 164 | 2 | RANGER_USERSYNC | UNKNOWN | 4 | INSTALL_FAILED | 4 | RANGER | NONE | UNKNOWN 165 | 2 | RANGER_ADMIN | UNKNOWN | 4 | INSTALL_FAILED | 4 | RANGER | NONE | UNKNOWN(3 rows)ambari=> select * from hostcomponentdesiredstate where component_name LIKE '%RANGER%'; cluster_id | component_name | desired_stack_id | desired_state | host_id | service_name | admin_state | maintenance_state | security_state | restart_required------------+----------------+------------------+---------------+---------+--------------+-------------+-------------------+----------------+------------------(0 rows)ambari=> select * from servicecomponentdesiredstate where component_name LIKE '%RANGER%'; component_name | cluster_id | desired_stack_id | desired_state | service_name-------------------+------------+------------------+---------------+-------------- RANGER_KMS_SERVER | 2 | 4 | INSTALLED | RANGER_KMS RANGER_ADMIN | 2 | 4 | INSTALLED | RANGER RANGER_USERSYNC | 2 | 4 | INSTALLED | RANGER
(3 rows) Delete in tables: ambari=> delete from hostcomponentstate where component_name LIKE '%RANGER%';DELETE 3ambari=> delete from servicecomponentdesiredstate where component_name LIKE '%RANGER%';
DELETE 3 Then delete Services: [root@great-wall02 ~]# curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://great-wall01.cloud.hortonworks.com:8080/api/v1/clusters/smesecurity/services/RANGER
[root@great-wall02 ~]# curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://great-wall01.cloud.hortonworks.com:8080/api/v1/clusters/smesecurity/services/RANGER_KMS Deleted the databases for ranger: 'ranger', 'ranger_kms', 'ranger_audit'. Tried reinstalling Ranger only but now getting the error below. Looks like there are some metadata in Ambari DB still that needs to be cleaned up. What ambari tables should I clean up? 05 Nov 2015 10:51:25,540 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ams-hbase-security-site
05 Nov 2015 10:51:25,541 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ams-hbase-site
05 Nov 2015 10:51:25,541 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ams-log4j
05 Nov 2015 10:51:25,542 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ams-site
05 Nov 2015 10:51:25,542 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ams-hbase-policy
05 Nov 2015 10:51:25,543 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ams-hbase-log4j
05 Nov 2015 10:51:25,543 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ams-env
05 Nov 2015 10:51:25,544 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ams-hbase-env
05 Nov 2015 10:51:25,544 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=kms-properties
05 Nov 2015 10:51:25,545 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ranger-kms-policymgr-ssl
05 Nov 2015 10:51:25,545 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=kms-log4j
05 Nov 2015 10:51:25,545 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ranger-kms-security
05 Nov 2015 10:51:25,546 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ranger-kms-audit
05 Nov 2015 10:51:25,546 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=dbks-site
05 Nov 2015 10:51:25,546 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=kms-env
05 Nov 2015 10:51:25,547 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=kms-site
05 Nov 2015 10:51:25,547 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ranger-kms-site
... View more
11-05-2015
05:30 AM
Getting this amabri server error now. This came up after clicking deploy (for Ranger, Ranger KMS and Ambari Metrics) Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "clusterservices_pkey"
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2161)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1890)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:559)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:417)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:363)
at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeDirectNoSelect(DatabaseAccessor.java:890)
... 130 more
... View more
11-05-2015
05:20 AM
1 Kudo
All nodes has the same repo version. I noticed that ambari.repo is gone. Possibly got deleted during host cleanup (python script)? Downloaded the ambari.repo and reinstalling ambari metrics. Stay tuned.
... View more
11-05-2015
05:06 AM
After removing Ambari Metrics, its got a side effect on Kafka. Kafka shouldn't be having problems after this process since it was running fine prior. [2015-11-04 20:58:47,740] FATAL (kafka.Kafka$)
java.lang.ClassNotFoundException: org.apache.hadoop.metrics2.sink.kafka.KafkaTimelineMetricsReporter
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at kafka.utils.CoreUtils$.createObject(CoreUtils.scala:231)
at kafka.metrics.KafkaMetricsReporter$$anonfun$startReporters$1.apply(KafkaMetricsReporter.scala:59)
at kafka.metrics.KafkaMetricsReporter$$anonfun$startReporters$1.apply(KafkaMetricsReporter.scala:58)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at kafka.metrics.KafkaMetricsReporter$.startReporters(KafkaMetricsReporter.scala:58)
at kafka.Kafka$.main(Kafka.scala:62)
at kafka.Kafka.main(Kafka.scala)
[2015-11-04 21:00:51,935] FATAL (kafka.Kafka$)
java.lang.ClassNotFoundException: org.apache.hadoop.metrics2.sink.kafka.KafkaTimelineMetricsReporter
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at kafka.utils.CoreUtils$.createObject(CoreUtils.scala:231)
at kafka.metrics.KafkaMetricsReporter$$anonfun$startReporters$1.apply(KafkaMetricsReporter.scala:59)
at kafka.metrics.KafkaMetricsReporter$$anonfun$startReporters$1.apply(KafkaMetricsReporter.scala:58)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at kafka.metrics.KafkaMetricsReporter$.startReporters(KafkaMetricsReporter.scala:58)
at kafka.Kafka$.main(Kafka.scala:62)
at kafka.Kafka.main(Kafka.scala)
... View more
11-05-2015
05:02 AM
Tried deleting Ambari Metrics from Ambari throug DELETE REST API and rerunning yum install ambari-metrics-collector. Same output. [root@great-wall02 ~]# curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://great-wall01.cloud.hortonworks.com:8080/api/v1/clusters/smesecurity/services/AMBARI_METRICS
[root@great-wall02 ~]# yum install ambari-metrics-collector
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirror.spro.net
* extras: mirrors.sonic.net
* updates: mirror.hostduplex.com
Setting up Install Process
No package ambari-metrics-collector available.
Error: Nothing to do
... View more
11-05-2015
04:59 AM
[root@great-wall02 ~]# yum install ambari-metrics-collector
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirror.spro.net
* extras: mirrors.sonic.net
* updates: centos-distro.cavecreek.net
Setting up Install Process
No package ambari-metrics-collector available.
Error: Nothing to do
[root@great-wall02 ~]# rpm -q ambari-metrics-collector
package ambari-metrics-collector is not installed
... View more
11-05-2015
04:47 AM
1 Kudo
On a 6 node cluster, using Ambari 2.1.2/HDP 2.3.2. Scenario 1: When installing HDP and it goes to a point where it's installing the services across all nodes, it suddenly fails and it's due to Ambari Metrics/Monitors failing. Full stack trace below from Ambari UI. Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 131, in <module>
AmsCollector().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 34, in install
self.install_packages(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 395, in install_packages
Package(name)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 45, in action_install
self.install_package(package_name, self.resource.use_repos, self.resource.skip_repos)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 49, in install_package
shell.checked_call(cmd, sudo=True, logoutput=self.get_logoutput())
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install ambari-metrics-collector' returned 1. Error: Nothing to do Running the same command from the last line of the error yields the same response. There were no Ambari Metrics log generated. Ambari server log didn't any info. Scenario 2: I reset ambari-server and cleanup all hosts. Re-run ambari wizard and install all services except for Ambari Metrics. HDP installed successfully. I now added Ambari Metrics back and I'm getting the same error. Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 131, in <module>
AmsCollector().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 34, in install
self.install_packages(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 395, in install_packages
Package(name)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 45, in action_install
self.install_package(package_name, self.resource.use_repos, self.resource.skip_repos)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 49, in install_package
shell.checked_call(cmd, sudo=True, logoutput=self.get_logoutput())
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install ambari-metrics-collector' returned 1. Error: Nothing to do
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Ranger