About slachterman

rajesht_tiwari · ‎09-09-2017

It is a wonderful article . I have some doubt . Assume There is parent.com and There is parent.child.com . Both have parent and child trust configured. There is a user childuser in parent.child.com. I have to delegate it to a constarined delegation user (kcduser ) which is in parent.com . Resource who is sitting in the parent.com . As Referring your article , I need to get TGT ticket from parent.com (where kcduser and Resource is available) to access parent.child.com . I need your input to confirm my understanding on this scenario . Thanks Rajesh

lvazquez · ‎08-31-2018

Yes, you will have to use basically the same configuration done when using a combination of OpenLDAP and MIT KDC for authentication. The only difference is you will be using AD as your LDAP server instead of OpenLDAP, and of course you will have to consider the different schemas for users/groups (samAccountName vs uid, etc).

joshua_petree · ‎05-02-2017

During the initial setup of HDFS using Ambari, configuring the pathway to /home/hadoop/hdfs/data, or any /home pathway for that matter, is illegal. After further digging and research, it is written as a security measure to prevent writing to the /home directory. I had to literally take the long way home.

slachterman · ‎04-27-2017

The HDFS Balancer program can be invoked to rebalance HDFS blocks when data nodes are added to or removed from the cluster. For more information about the HDFS balancer, see this HCC article. Since Kerberos tickets are designed to expire, a common question that arises in secure clusters is whether one needs to account for ticket expiration (namely, TGT) when invoking long-running Balancer jobs. To cut to the chase: the answer depends on just how long the job takes to run. Let's discuss some background context (I am referencing Chris Nauroth's excellent answer on Stack Overflow as well as HDFS-9698 below). The primary use case for Kerberos authentication in the Hadoop ecosystem is Hadoop's RPC framework, which uses SASL for authentication. Many daemon processes, i.e., non-interactive processes, call UserGroupInformation#loginUserFromKeytab at process startup, using a keytab to authenticate to the KDC. Moreover, Hadoop implements an automatic re-login mechanism directly inside the RPC client layer. The code for this is visible in the RPC Client#handleSaslConnectionFailure method: // try re-login if (UserGroupInformation.isLoginKeytabBased()) { UserGroupInformation.getLoginUser().reloginFromKeytab(); } else if (UserGroupInformation.isLoginTicketBased()) { UserGroupInformation.getLoginUser().reloginFromTicketCache(); } However, the Balancer is not designed to be run as a daemon (in Hadoop 2.7.3, i.e., HDP 2.6 and earlier)! Please see HDFS-9804, which introduces this capability. With this in place, the Balancer would log in with a keytab and the above re-login mechanism would take care of everything. Since the Balancer is designed to be run interactively, the assumption is that kinit has already run, and there is a TGT sitting in the ticket cache. Now we need to understand some Kerberos configuration settings, in particular the distinction between ticket_lifetime and renew_lifetime. Every ticket, including TGTs, have a ticket_lifetime (usually around 18 hours), this strikes the balance between annoying the user by requiring they log in multiple times during their workday and mitigating the risk of TGTs being stolen (note there is separate support for preventing replay of authenticators). A ticket can be renewed, but only up to its renew_lifetime (usually around 7 days). which it can be renewed to extend to a maximum value of the later. Since a TGT is generated by the user and provided to the balancer (which means in the balancer context, UserGroupInformation.isLoginTicketBased() == true), client#handleSaslConnectionFailure is behaving correctly on extending the ticket_lifetime. But there's no way to extend beyond the renew_lifetime! To summarize, if your balancer job is going to run longer than the configured renew_lifetime in your environment (a week by default), then you need to worry about ticket renewal (or you need HDFS-9804). Otherwise, you will be fine relying on the RPC framework to renew the TGT's ticket_lifetime (as long it doesn't eclipse the renew_lifetime).

slachterman · ‎04-27-2017

This article assumes you have already identified the GUID associated with your Atlas entity and that the tag in question you wish to associate to this entity already exists. For more information, on how to identify the GUID for your entity, please see this HCC article by Mark Johnson. For example, if we wanted to add a new tag to all hive tables containing the word "claim", we could use the Full Text Search capability to identify all entities (replace admin:admin with the username:password values for an Atlas administrator): curl -u admin:admin http://$ATLAS_SERVER:21000/api/atlas/discovery/search/fulltext?query=claim Now we are ready to assign a new tag, called "PII", to all such entities. We are using the v1 API for Atlas in HDP 2.5. In HDP 2.6, the Atlas API has been revamped and simplified. Please see this HCC article for more details. Let's construct an example using the first GUID, f7a24ec6-5b0c-42d8-ba8a-1ac654d24f45. We will use the traits resource associated with this entity, and POST our payload to this endpoint. curl -u admin:admin http://$ATLAS_SERVER:21000/api/atlas/entities/f7a24ec6-5b0c-42d8-ba8a-1ac654d24f45/traits -X POST -H 'Content-Type: application/json' --data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"PII","values":{}}' We can now query the traits of this entity using a GET request. curl -u admin:admin http://$ATLAS_SERVER:21000/api/atlas/entities/f7a24ec6-5b0c-42d8-ba8a-1ac654d24f45/traits We can also see the tag now exists in the Atlas UI:

slachterman · ‎04-27-2017

Q1) What is the difference between using the programs kadmin.local and kadmin, respectively? A1) The difference between kadmin and kadmin.local is that kadmin.local directly accesses the KDC database (a file called principal in /var/kerberos/krb5kdc) and does not use Kerberos for authentication. Since kadmin.local directly accesses the KDC database, it must be run directly on the master KDC as a user with sufficient permissions to read the KDC database. When using kadmin to administer the KDC database, the user is communicating with the kadmind daemon over the network and will authenticate using Kerberos to the KDC master. Hence, the first principal must already exist before connecting over the network—this is the motivation for the existence of kadmin.local. This also means the KDC administrator will need to kinit as the administrator principal (by default, admin/admin@REALM) to run kadmin from another box. Q2) How can we restrict which users can administer the MIT-KDC service? A2) There is an ACL, /var/kerberos/krb5kdc/kadm5.acl, which authorizes access to the KDC database. By default, there is a line: */admin@REALM * This provides authorization for all operations to all Kerberos principals that have an instance matching that form, like admin/admin@REALM Q3) What local (POSIX) permissions are needed by MIT-KDC administrators? It’s important to make a distinction between the user's network account and the associated Kerberos principal. The network account needs permissions to read the KDC database when running kadmin.local, per the above. If this user has sudo access on this box then this is sufficient (the KDC database is usually only readable by root). Tangentially, this is a good motivation to run the KDC on a different box than one of the cluster hosts, for separation of concerns between cluster and KDC administrators.

slachterman · ‎02-18-2017

Imagine we have a use case of exchanging data with an external party via AWS S3 buckets: we want to push these messages into our internal Kafka cluster, after enriching each message with additional metadata, in an event-driven fashion. In AWS, this is supported by associating notifications with an S3 bucket. These destinations can make use of a few different destinations, namely, SQS, SNS, and Lambda. We'll focus on the SQS approach, and will make use of NiFi's GetSQS processor. To configure this in AWS, navigate to the S3 bucket and then to the Properties tab, and scroll down to Advanced settings > Events. You'll need to create an SQS queue for this purpose. With this configured, a new SQS message will appear any time an object is created within our S3 bucket. We need to configure some IAM policies in order for our NiFi data flow to be authorized to read from the S3 bucket and to read from the SQS queue. We will authenticate from NiFi using the Access Key and Secret Key associated with a particular IAM user. First, the IAM policy for reading from the S3 bucket called nifi: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:ListBucketMultipartUploads" ], "Resource": [ "arn:aws:s3:::nifi" ] }, { "Effect": "Allow", "Action": [ "s3:GetObject" ], "Resource": [ "arn:aws:s3:::nifi/*" ] } ] } Second, the IAM policy for reading from--as well as deleting from since we'll configure GetSQS to auto-delete received messages--the SQS queue. We'll need the ARN and URL associated with our SQS queue, these can be retrieved from the SQS Management Console and navigating to the SQS queue name we created above. Note: we could harden this by restricting the permitted SQS actions further. { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "sqs:*", "Resource": "arn:aws:sqs:$REGION:$UUID:$QUEUE_NAME" } ] } You will then need to attach these policies to your IAM user via the IAM Management Console. We are now ready to build the NiFi data flow: For the GetSQS processor, just use the SQS queue URL from the SQS Management console that we retrieved above, the Region, and the Access Key and Secret Key associated with the IAM user. We'll use SplitJSON to extract the name of the file associated with the SQS notification. We'll need this to fetch the object from S3. ExtractText is used to associate the result of the JsonPath expression with a new custom attribute, $filename: Which we'll pass into the FetchS3Object processor: Finally, we can enrich the message with UpdateAttribute using Advanced > Rules and push to our Kafka topic using the PublishKafka processor. I'd like to credit this blog post: https://adamlamar.github.io/2016-01-30-monitoring-an-s3-bucket-in-apache-nifi/

oisteadman · ‎06-14-2019

The solution for me was to chmod 777. @adel gacem 755 is not enough.

slachterman · ‎02-20-2017

Please accept the above answer if it addressed your question. Create > Post Idea (create button in upper right toolbar):

qiwang · ‎02-03-2017

So here are the values that work in my environment nifi.security.identity.mapping.pattern.dn=^CN=(.*?), OU=(.*?)$ nifi.security.identity.mapping.pattern.kerb=^(.*?)@(.*?)$ nifi.security.identity.mapping.value.dn=$1 nifi.security.identity.mapping.value.kerb=$1 Also in Ranger the Nifi nodes need to be added as internal user and create policy for them to access proxy, flow and data

Online	Offline
Last Visited	‎05-03-2018 08:43 PM

Member Since	‎06-20-2016 02:58 PM
Last Visited	‎05-03-2018 08:43 PM
Posts	251
Kudos received	196

Cloudera Community

Re: PySpark and Python version (<3.6)?

Re: Ambari Server Start failure - Ranger Atlas Ta...

Re: Using underscore _ in a database name in HIVE

Re: Active directory as Directory Service and MIT ...

Re: 4 node cluster configuration

Re: How does a cross-realm trust work?

Re: Active directory as Directory Service and MIT ...

Re: HDFS NameNode Capacity Alert on Fresh Install

HDFS Balancer and Kerberos Ticket Renewal

Atlas API: Associate a tag with an entity

Secure Administration of MIT-KDC

HDF and orchestration of AWS data flows S3 and SQS

Re: getfile : nifi does not have sufficient permi...

Re: Atlas Automated Tagging using rules on column ...

Re: How to use Ranger to authorize access for Nifi...