About slachterman

slachterman · ‎04-30-2017

In a Kafka cluster, there are multiple brokers to support the design goals of availability and high throughput. The broker list defines where the Producer can find a one or more Brokers to determine the Leader for each topic. This does not need to be the full set of Brokers in your cluster, but should include at least two in case the first Broker is not available. One needn't worry about figuring out which Broker is the leader for the topic (and partition), as the Producer knows how to connect to the Broker and ask for the metadata and then connect to the correct Broker.

slachterman · ‎04-27-2017

Hi @Joshua Petree, what is the purpose of this cluster? For any cluster that's beyond a Dev sandbox, you need 3 to 5 masters. In order for Zookeeper to function properly, you need at least three ZK instances. It's not recommended to run a Secondary NameNode or any other services, such as ZK, on a DataNode. Also, in order for HDFS to be HA, you need to run a Standby NameNode. Remember that Hadoop is designed with the assumption that DataNodes will fail. If you start putting critical services on DataNodes, not only will it hurt your performance, it will create points of failure that will affect the overall health of the cluster.

slachterman · ‎04-27-2017

The HDFS Balancer program can be invoked to rebalance HDFS blocks when data nodes are added to or removed from the cluster. For more information about the HDFS balancer, see this HCC article. Since Kerberos tickets are designed to expire, a common question that arises in secure clusters is whether one needs to account for ticket expiration (namely, TGT) when invoking long-running Balancer jobs. To cut to the chase: the answer depends on just how long the job takes to run. Let's discuss some background context (I am referencing Chris Nauroth's excellent answer on Stack Overflow as well as HDFS-9698 below). The primary use case for Kerberos authentication in the Hadoop ecosystem is Hadoop's RPC framework, which uses SASL for authentication. Many daemon processes, i.e., non-interactive processes, call UserGroupInformation#loginUserFromKeytab at process startup, using a keytab to authenticate to the KDC. Moreover, Hadoop implements an automatic re-login mechanism directly inside the RPC client layer. The code for this is visible in the RPC Client#handleSaslConnectionFailure method: // try re-login if (UserGroupInformation.isLoginKeytabBased()) { UserGroupInformation.getLoginUser().reloginFromKeytab(); } else if (UserGroupInformation.isLoginTicketBased()) { UserGroupInformation.getLoginUser().reloginFromTicketCache(); } However, the Balancer is not designed to be run as a daemon (in Hadoop 2.7.3, i.e., HDP 2.6 and earlier)! Please see HDFS-9804, which introduces this capability. With this in place, the Balancer would log in with a keytab and the above re-login mechanism would take care of everything. Since the Balancer is designed to be run interactively, the assumption is that kinit has already run, and there is a TGT sitting in the ticket cache. Now we need to understand some Kerberos configuration settings, in particular the distinction between ticket_lifetime and renew_lifetime. Every ticket, including TGTs, have a ticket_lifetime (usually around 18 hours), this strikes the balance between annoying the user by requiring they log in multiple times during their workday and mitigating the risk of TGTs being stolen (note there is separate support for preventing replay of authenticators). A ticket can be renewed, but only up to its renew_lifetime (usually around 7 days). which it can be renewed to extend to a maximum value of the later. Since a TGT is generated by the user and provided to the balancer (which means in the balancer context, UserGroupInformation.isLoginTicketBased() == true), client#handleSaslConnectionFailure is behaving correctly on extending the ticket_lifetime. But there's no way to extend beyond the renew_lifetime! To summarize, if your balancer job is going to run longer than the configured renew_lifetime in your environment (a week by default), then you need to worry about ticket renewal (or you need HDFS-9804). Otherwise, you will be fine relying on the RPC framework to renew the TGT's ticket_lifetime (as long it doesn't eclipse the renew_lifetime).

slachterman · ‎04-27-2017

This article assumes you have already identified the GUID associated with your Atlas entity and that the tag in question you wish to associate to this entity already exists. For more information, on how to identify the GUID for your entity, please see this HCC article by Mark Johnson. For example, if we wanted to add a new tag to all hive tables containing the word "claim", we could use the Full Text Search capability to identify all entities (replace admin:admin with the username:password values for an Atlas administrator): curl -u admin:admin http://$ATLAS_SERVER:21000/api/atlas/discovery/search/fulltext?query=claim Now we are ready to assign a new tag, called "PII", to all such entities. We are using the v1 API for Atlas in HDP 2.5. In HDP 2.6, the Atlas API has been revamped and simplified. Please see this HCC article for more details. Let's construct an example using the first GUID, f7a24ec6-5b0c-42d8-ba8a-1ac654d24f45. We will use the traits resource associated with this entity, and POST our payload to this endpoint. curl -u admin:admin http://$ATLAS_SERVER:21000/api/atlas/entities/f7a24ec6-5b0c-42d8-ba8a-1ac654d24f45/traits -X POST -H 'Content-Type: application/json' --data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"PII","values":{}}' We can now query the traits of this entity using a GET request. curl -u admin:admin http://$ATLAS_SERVER:21000/api/atlas/entities/f7a24ec6-5b0c-42d8-ba8a-1ac654d24f45/traits We can also see the tag now exists in the Atlas UI:

slachterman · ‎04-27-2017

Q1) What is the difference between using the programs kadmin.local and kadmin, respectively? A1) The difference between kadmin and kadmin.local is that kadmin.local directly accesses the KDC database (a file called principal in /var/kerberos/krb5kdc) and does not use Kerberos for authentication. Since kadmin.local directly accesses the KDC database, it must be run directly on the master KDC as a user with sufficient permissions to read the KDC database. When using kadmin to administer the KDC database, the user is communicating with the kadmind daemon over the network and will authenticate using Kerberos to the KDC master. Hence, the first principal must already exist before connecting over the network—this is the motivation for the existence of kadmin.local. This also means the KDC administrator will need to kinit as the administrator principal (by default, admin/admin@REALM) to run kadmin from another box. Q2) How can we restrict which users can administer the MIT-KDC service? A2) There is an ACL, /var/kerberos/krb5kdc/kadm5.acl, which authorizes access to the KDC database. By default, there is a line: */admin@REALM * This provides authorization for all operations to all Kerberos principals that have an instance matching that form, like admin/admin@REALM Q3) What local (POSIX) permissions are needed by MIT-KDC administrators? It’s important to make a distinction between the user's network account and the associated Kerberos principal. The network account needs permissions to read the KDC database when running kadmin.local, per the above. If this user has sudo access on this box then this is sufficient (the KDC database is usually only readable by root). Tangentially, this is a good motivation to run the KDC on a different box than one of the cluster hosts, for separation of concerns between cluster and KDC administrators.

slachterman · ‎04-03-2017

@Vadim Vaks it may be worth noting that the more recent GenerateTableFetch processor functionality provides additional flexibility in parallelizing the retrieval of data from an RDBMS

slachterman · ‎03-22-2017

@Pablo Pedemonte @Andy LoPresto this was addressed in Ambari 2.4.2 per https://issues.apache.org/jira/browse/AMBARI-18910

slachterman · ‎03-10-2017

Sunile and I troubleshot this issue further. The first thing we did is we enabled sun.security.krb5.debug=true, this can be done in bootstrap.conf, see this doc. What we found, nifi-bootstap.log, was: INFO [NiFi logging handler] org.apache.nifi.StdOut Found unsupported keytype (18) for smanjee@FIELD Keytype 18 is aes256-cts-hmac-sha1-96 (see this page), and was the cipher used when we created the keytab for the smanjee@FIELD user. We created a new keytab using the cipher des3-cbc-sha1, and this resolved the issue. Note: I am not recommending that weak ciphers be used in Production environments.

slachterman · ‎02-20-2017

Please accept the above answer if it addressed your question. Create > Post Idea (create button in upper right toolbar):

slachterman · ‎02-18-2017

Not natively with current functionality, but you may be interested in a product called DataGuise. If you rolled your own solution of identifying which entities satisfied such a condition, you could use the Atlas API to associate tags with those entities, please see this HCC answer. This is a nice Idea to post to HCC.

Online	Offline
Last Visited	‎05-03-2018 08:43 PM

Member Since	‎06-20-2016 02:58 PM
Last Visited	‎05-03-2018 08:43 PM
Posts	251
Kudos received	196

Cloudera Community

Re: PySpark and Python version (<3.6)?

Re: Ambari Server Start failure - Ranger Atlas Ta...

Re: Using underscore _ in a database name in HIVE

Re: Active directory as Directory Service and MIT ...

Re: 4 node cluster configuration

Re: In Kafak, what is the difference while produci...

Re: 4 node cluster configuration

HDFS Balancer and Kerberos Ticket Renewal

Atlas API: Associate a tag with an entity

Secure Administration of MIT-KDC

Re: Can I use NiFi to replace Sqoop?

Re: Ambari server 2.1.0: How to enable TLSv1.2 for...

Re: NiFi SelectHiveQL fails

Re: Atlas Automated Tagging using rules on column ...

Re: Atlas Automated Tagging using rules on column ...