Member since
06-20-2016
251
Posts
196
Kudos Received
36
Solutions
04-27-2017
06:07 PM
1 Kudo
This article assumes you have already identified the GUID associated with your Atlas entity and that the tag in question you wish to associate to this entity already exists. For more information, on how to identify the GUID for your entity, please see this HCC article by Mark Johnson. For example, if we wanted to add a new tag to all hive tables containing the word "claim", we could use the Full Text Search capability to identify all entities (replace admin:admin with the username:password values for an Atlas administrator): curl -u admin:admin http://$ATLAS_SERVER:21000/api/atlas/discovery/search/fulltext?query=claim Now we are ready to assign a new tag, called "PII", to all such entities. We are using the v1 API for Atlas in HDP 2.5. In HDP 2.6, the Atlas API has been revamped and simplified. Please see this HCC article for more details. Let's construct an example using the first GUID, f7a24ec6-5b0c-42d8-ba8a-1ac654d24f45. We will use the traits resource associated with this entity, and POST our payload to this endpoint. curl -u admin:admin http://$ATLAS_SERVER:21000/api/atlas/entities/f7a24ec6-5b0c-42d8-ba8a-1ac654d24f45/traits -X POST -H 'Content-Type: application/json' --data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"PII","values":{}}'
We can now query the traits of this entity using a GET request. curl -u admin:admin http://$ATLAS_SERVER:21000/api/atlas/entities/f7a24ec6-5b0c-42d8-ba8a-1ac654d24f45/traits We can also see the tag now exists in the Atlas UI:
... View more
Labels:
04-27-2017
05:04 PM
3 Kudos
Q1) What is the difference between using the programs kadmin.local and kadmin, respectively? A1) The difference between kadmin and kadmin.local is that kadmin.local directly accesses the KDC database (a file called principal in /var/kerberos/krb5kdc) and does not use Kerberos for authentication. Since kadmin.local directly accesses the KDC database, it must be run directly on the master KDC as a user with sufficient permissions to read the KDC database. When using kadmin to administer the KDC database, the user is communicating with the kadmind daemon over the network and will authenticate using Kerberos to the KDC master. Hence, the first principal must already exist before connecting over the network—this is the motivation for the existence of kadmin.local. This also means the KDC administrator will need to kinit as the administrator principal (by default, admin/admin@REALM) to run kadmin from another box. Q2) How can we restrict which users can administer the MIT-KDC service? A2) There is an ACL, /var/kerberos/krb5kdc/kadm5.acl, which authorizes access to the KDC database. By default, there is a line:
*/admin@REALM * This provides authorization for all operations to all Kerberos principals that have an instance matching that form, like admin/admin@REALM Q3) What local (POSIX) permissions are needed by MIT-KDC administrators? It’s important to make a distinction between the user's network account and the associated Kerberos principal. The network account needs permissions to read the KDC database when running kadmin.local, per the above. If this user has sudo access on this box then this is sufficient (the KDC database is usually only readable by root). Tangentially, this is a good motivation to run the KDC on a different box than one of the cluster hosts, for separation of concerns between cluster and KDC administrators.
... View more
Labels:
02-18-2017
05:20 PM
2 Kudos
Imagine we have a use case of exchanging data with an external party via AWS S3 buckets: we want to push these messages into our internal Kafka cluster, after enriching each message with additional metadata, in an event-driven fashion. In AWS, this is supported by associating notifications with an S3 bucket. These destinations can make use of a few different destinations, namely, SQS, SNS, and Lambda. We'll focus on the SQS approach, and will make use of NiFi's GetSQS processor. To configure this in AWS, navigate to the S3 bucket and then to the Properties tab, and scroll down to Advanced settings > Events. You'll need to create an SQS queue for this purpose. With this configured, a new SQS message will appear any time an object is created within our S3 bucket. We need to configure some IAM policies in order for our NiFi data flow to be authorized to read from the S3 bucket and to read from the SQS queue. We will authenticate from NiFi using the Access Key and Secret Key associated with a particular IAM user. First, the IAM policy for reading from the S3 bucket called nifi: {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:ListBucketMultipartUploads"
],
"Resource": [
"arn:aws:s3:::nifi"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::nifi/*"
]
}
]
}
Second, the IAM policy for reading from--as well as deleting from since we'll configure GetSQS to auto-delete received messages--the SQS queue. We'll need the ARN and URL associated with our SQS queue, these can be retrieved from the SQS Management Console and navigating to the SQS queue name we created above. Note: we could harden this by restricting the permitted SQS actions further. {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sqs:*",
"Resource": "arn:aws:sqs:$REGION:$UUID:$QUEUE_NAME"
}
]
}
You will then need to attach these policies to your IAM user via the IAM Management Console. We are now ready to build the NiFi data flow: For the GetSQS processor, just use the SQS queue URL from the SQS Management console that we retrieved above, the Region, and the Access Key and Secret Key associated with the IAM user. We'll use SplitJSON to extract the name of the file associated with the SQS notification. We'll need this to fetch the object from S3. ExtractText is used to associate the result of the JsonPath expression with a new custom attribute, $filename: Which we'll pass into the FetchS3Object processor: Finally, we can enrich the message with UpdateAttribute using Advanced > Rules and push to our Kafka topic using the PublishKafka processor. I'd like to credit this blog post: https://adamlamar.github.io/2016-01-30-monitoring-an-s3-bucket-in-apache-nifi/
... View more
Labels:
01-31-2017
11:19 PM
2 Kudos
In Zeppelin LDAP Authentication with OpenLDAP and How to Set Up OpenLDAP we've shown how to use LDAP Authentication with Zeppelin. In this article, we'll harden that configuration by ensuring that Zeppelin and OpenLDAP communicate over LDAPS. LDAPS is a secure protocol that uses TLS to assure authenticity, confidentiality, and integrity of communications. This prevents man-in-the-middle attacks that sniff traffic to discover LDAP credentials communicated in plaintext, which could compromise the security of the cluster. The first step is to modify the configuration of the OpenLDAP server, as root, to expose LDAPS connectivity, we'll need to modify /etc/openldap/ldap.conf. Please recall that we created /etc/openldap/certs/myldap.field.hortonworks.com.cert in the How to Set Up OpenLDAP article #TLS_CACERTDIR /etc/openldap/certs
TLS_CACERT /etc/openldap/certs/myldap.field.hortonworks.com.cert
URI ldaps://myldap.field.hortonworks.com ldap://myldap.field.hortonworks.com
BASE dc=field,dc=hortonworks,dc=com We also need to modify /etc/sysconfig/slapd : SLAPD_URLS="ldapi:/// ldap:/// ldaps:///" Then restart slapd: systemctl restart slapd You can confirm that slapd is listening on 636: netstat -anp | grep 636 Finally, confirm TLS connectivity and secure ldapsearch (with the appropriate bind user and password from the previous articles): # should succeed
openssl s_client -connect myldap.field.hortonworks.com:636 </dev/null
# should succeed
ldapsearch -H ldaps://myldap.field.hortonworks.com:636 -D cn=ldapadm,dc=field,dc=hortonworks,dc=com -w $password -b "ou=People,dc=field,dc=hortonworks,dc=com" The next step is the client-side configuration changes. Since we are using a self-signed certificate for the OpenLDAP server, we need to import this into the Java truststore, called cacerts, which is in /etc/pki/ca-trust/extracted/java on my CentOS 7 system. Copy the myldap.field.hortonworks.com.cert file from the OpenLDAP server to the Zeppelin server (this file does not contain sensitive key material, only public keys), and run (making sure you set this certificate to be trusted): keytool -import -alias myldap -file /etc/security/certificates/myldap.field.hortonworks.com.cert -keystore cacerts Otherwise, you will see errors like Root exception is javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed Lastly, in Ambari, we just need to make one small change to the shiro.ini configuration in Zeppelin > Config > Advanced zeppelin-env > shiro_ini_content : ldapRealm.contextFactory.url = ldaps://myldap.field.hortonworks.com:636 Note the protocol change to LDAPS and the port number change to 636. To test, restart the Zeppelin service and confirm that users can still log in to the Zeppelin UI.
... View more
Labels:
01-25-2017
06:50 PM
4 Kudos
Apache Ranger uses an embedded Tomcat server to provide the Web UI functionality for administration of Ranger. A previous HCC article provided details on maintenance of the log files that are managed by the log4j configuration, including xa_portal.log, ranger_admin_perf.log, xa_portal_sql.log.
We're going to focus on maintenance of the access_log* logs that get automatically generated by Tomcat, but which are not managed by this log4j configuration. With embedded Tomcat, the configuration is contained within the code for the AccessLogValve (as you can see, it uses an hourly rotation pattern unless overridden by ranger.accesslog.dateformat).
We'll use the logrotate application in CentOS/RHEL to manage these access_log* logs as the number of files can grow large without rotation and removal in place. You can check to see how many of these files you have on your Ranger Admin node by running (there would be one access_log* file per hour for each day during which the service has ran continuously):
ls /var/log/ranger/admin | cut -d '.' -f 1 | uniq -c
Within /etc/logrotate.d, we'll create a configuration specific to these Ranger logs, as the configuration for logrotate, in /etc/logrotate.conf by default, will include these application-spcific configurations as well.
Create a new file (as root) ranger_access in /etc/logrotate.d in your favorite editor and then insert:
/var/log/ranger/admin/access_log* {
daily
copytruncate
compress
dateext
rotate 5
maxage 7
olddir /var/log/ranger/admin/old
missingok
}
This is just an example logrotate configuration. I'll make note of a couple items, please see the man page for details on each of these options and some additional examples.
The copytruncate option ensures that Tomcat can keep writing to the same file handle (as opposed to writing to a newly-created file which requires recycling Tomcat)
The compress option will use gzip by default
Maxage limits how old the files are that will be kept
Olddir indicates that logs are moved into the directory for rotation
Logrotate will be invoked daily as a cronjob by default, due to the existence of the logrotate file in /etc/cron.daily. You can run logrotate manually by specifying the configuration:
sudo /usr/sbin/logrotate /etc/logrotate.conf
Note that logrotate keeps the state of files in the /var/lib/logrotate.status, and it uses the date of last execution captured there as the reference of what to do with a logfile. You can also run logrotate with the -d flag to test your configuration (this won't actually do anything, it will just produce output regarding what would happen).
sudo /usr/sbin/logrotate -d /etc/logrotate.conf 2> /tmp/logrotate.debug
As a result of this configuration, only 5 days worth of logs are kept, they're kept in the ./old directory, and they're compressed. This ensures that the Ranger admin access_log* logs data does not grow unmanageably large.
... View more
Labels:
07-31-2019
03:23 AM
@slachterman i just followed the instructions provided in the article to move generated flow files to ADLS using PutHDFS processor, But I am getting the below errors. Please help. I have specified all the configurations for the PutHDFS as per the article. It is keep on saying "unable to find the valid certification path to requested target, but not sure where i need to upload the ADLS cred certificate in the PutHDFS processor 2019-07-30 17:24:20,657 ERROR [Timer-Driven Process Thread-9] o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=44e51785-016c-1000-901a-6aa4d9167c2c] Failed to access HDFS due to com.microsoft.azure.datalake.store.ADLException: Last encountered exception thrown after 5 tries. [javax.net.ssl.SSLHandshakeException,javax.net.ssl.SSLHandshakeException,javax.net.ssl.SSLHandshakeException,javax.net.ssl.SSLHandshakeException,javax.net.ssl.SSLHandshakeException] [ServerRequestId:null]: com.microsoft.azure.datalake.store.ADLException: Error getting info for file / Operation GETFILESTATUS failed with exception javax.net.ssl.SSLHandshakeException : sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target Last encountered exception thrown after 5 tries. [javax.net.ssl.SSLHandshakeException,javax.net.ssl.SSLHandshakeException,javax.net.ssl.SSLHandshakeException,javax.net.ssl.SSLHandshakeException,javax.net.ssl.SSLHandshakeException] [ServerRequestId:null] Last encountered exception thrown after 5 tries. [javax.net.ssl.SSLHandshakeException,javax.net.ssl.SSLHandshakeException,javax.net.ssl.SSLHandshakeException,javax.net.ssl.SSLHandshakeException,javax.net.ssl.SSLHandshakeException] [ServerRequestId:null] at com.microsoft.azure.datalake.store.ADLStoreClient.getExceptionFromResponse(ADLStoreClient.java:1194) at com.microsoft.azure.datalake.store.ADLStoreClient.getDirectoryEntry(ADLStoreClient.java:741) at org.apache.hadoop.fs.adl.AdlFileSystem.getFileStatus(AdlFileSystem.java:487) at org.apache.nifi.processors.hadoop.PutHDFS$1.run(PutHDFS.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1942) at org.apache.nifi.processors.hadoop.PutHDFS.onTrigger(PutHDFS.java:236) at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:209) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1946) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:316) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:310) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1639) at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:223) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1037) at sun.security.ssl.Handshaker.process_record(Handshaker.java:965) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1064) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1367) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1395) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1379) at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1564) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:347) at com.microsoft.azure.datalake.store.HttpTransport.makeSingleCall(HttpTransport.java:307) at com.microsoft.azure.datalake.store.HttpTransport.makeCall(HttpTransport.java:90) at com.microsoft.azure.datalake.store.Core.getFileStatus(Core.java:691) at com.microsoft.azure.datalake.store.ADLStoreClient.getDirectoryEntry(ADLStoreClient.java:739) ... 18 common frames omitted
... View more
11-18-2016
03:00 AM
Very nice article. If you got step by step procedure with pre-requisites,could you pls fwd to me (muthukumar.siva@gmail.com) i would like to implement in my environment. Thank you in advance.
... View more
01-25-2018
11:14 PM
Were you able to complete the rest of the steps?
... View more
12-07-2016
05:58 PM
We don't use Ranger (yet) but HiveServer2 with impersonation. So I must specify the Hive principal in the connection string. Works in beeline and with JDBC clients, but I don't know where to specify principal=hive/hivehost@MYREALM in Tableau. Do you have any idea by chance? Thx!
... View more
- « Previous
-
- 1
- 2
- Next »