About slachterman

slachterman · ‎10-26-2016

@cduby I published an HCC article on this topic, please see: https://community.hortonworks.com/articles/63664/how-to-create-a-ranger-policy-that-prohibits-combi.html

slachterman · ‎10-19-2016

Apache Knox provides a gateway to HiveServer2 which can be used to proxy connections to Hive from BI tools like Tableau Desktop. With a secure cluster HiveServer2 is kerberized, which usually means the client must have a valid TGT in their ticket cache in order to authenticate to HiveServer2. The Knox gateway can simplify this in many environments, by supporting client authentication to kerberized Hive without a TGT. The client authenticates to Knox via, say, LDAP authentication, and HiveServer2 trusts Knox to proxy connections on behalf of this user. Another benefit of using Knox is that client doesn't need to know where HiveServer2 is hosted, so if an administrator needs to move it to another master node, this is transparent to the applications connecting via Knox. In the environment for this walkthrough, I have Tableau Desktop 10.0.1 installed, and I'll be using HDP 2.5. The first step is to ensure you have the latest Hortonworks ODBC Driver for Apache Hive, 2.1.5 at the time of this writing, which can be downloaded from https://hortonworks.com/downloads/ I will assume that Knox has already been installed. If you don't have a LDAP server in your environment for testing, Ambari includes a Demo LDAP service that can be started from the Knox service. Further, we assume Ranger is being used for Hive authentication (so Hive impersonation is disabled, i.e., hive.server2.enable.doAs is false). In order to configure Hive for Knox, we'll need to change the transport mode (hive.server2.transport.mode) to 'http' (it is set to 'binary' by default) and then restart Hive. Assuming Ranger is being used to authorize access to Knox, we'll need to create an appropriate policy in Ranger for the users and groups which require access. On our local machine, we'll need access to the certificate Knox is using for TLS. In production environments, your Desktop and PKI administrators may already have taken steps to assure that your system has the appropriate certificate installed. By default, Knox uses a self-signed certificate which is not trusted by our system. We can extract this certificate from the Knox server (assuming we have appropriate access) and then specify its location when connecting from Tableau Desktop. Please note this certificate does not contain sensitive data such as private key material. We can execute the following commands to extract the certificate on the Knox server host. knoxserver=$(hostname -f) openssl s_client -connect ${knoxserver}:8443 <<<'' | openssl x509 -out /tmp/knox.crt We then need to copy the certificate to our local environment, using scp or some other utility, and take note of its location. We are now ready to connect to Hive from Tableau Desktop. You may want to test connectivity from beeline to ease later troubleshooting, to assure any issues identified are specific to Tableau (although please note beeline will use JDBC). We will use the Hortonworks Hadoop Hive native connector. We will specify our Knox Server hostname in the Server field, the port over which we are connecting to Knox using TLS, an Authentication method of HTTPS, and the username and password of the LDAP user that has access (via the Ranger policies for Knox and for Hive that have been configured). Please see the Advanced users-ldif within Knox configuration for identifying appropriate users in the Demo LDAP (such as guest, sam, and tom). We need to specify the HTTP path of gateway/default/hive, select the Require SSL checkbox, and click the "No custom configuration . . ." orange link to specify the path at which we've saved the Knox certificate (or at which our Desktop Administrator has provided a root certificate that was used to sign Knox's certificate), as seen in the screenshot below.

slachterman · ‎10-18-2016

That's right, the URL would specify HTTPS and the port on which NiFi is running on that host. With the new masterless architecture in HDF 2.0, the URL specified in the RPG can be any cluster node (in previous versions it had to be the NCM). Please accept the above answer if it was helpful to you.

slachterman · ‎10-18-2016

@mayki wogno if you want to use the S2S protocol to distribute the SFTP fetches over the 4 NiFi nodes, then it will be necessary to have an RPG. ListSFTP would be configured to only run on the primary node and would connect to the RPG, which would point back to the same NiFi cluster. You would then connect the associated input port with the process group containing the FetchSFTP and PutHDFS processors. In a NiFi cluster, each node is processing the same dataflow (with the exception of Isolated Processors like ListSFTP only run on the primary node). Without a distribution mechanism such as the S2S protocol, there is no means to partition the file listing metadata so that each processing node fetches a distinct subset of the files on the SFTP server.

slachterman · ‎10-13-2016

Thanks @Muthukumar S, can you please provide further details? How does role-based authentication work with an on-premise source outside of AWS?

slachterman · ‎10-08-2016

In HDF 2.0, administrators can secure access to individual NiFi components in order to support multi-tenant authorization. This provides organizations the ability to create least privilege policies for distinct groups of users. For example, let's imagine we have a NiFi Team and a Hadoop Team at our company and the Hadoop Team can only access dataflows they've created, whereas the NiFi Team can access all dataflows. NiFi 1.0 in HDF 2.0 can use different authorizers, such as file-based policies (managed within NiFi) and Ranger-based policies (managed within Ranger), as well as custom, pluggable authorizers. In this example, we'll use Ranger. For more detail on configuring Ranger as the authorizer for NiFi, please see this article. To separate the different teams' dataflows, we'll create separate process groups for each team. In NiFi, access policies are inheritable, supporting simpler policy management with the flexibility of overriding access at the component level. This means that all processors, as well as any nested process groups, within the Hadoop Team's root process group will be accessible by the Hadoop Team automatically. Let's see an example of the canvas when nifiadmin, a member of the NiFi team, is logged in. On the other hand, when hadoopadmin, a member of the Hadoop Team is logged in, we'll see a different representation, given the different level of access. When hadoopadmin drills down into the NiFi Team's process group (notice the title is blank without read access), notice that this user cannot make any changes (the toolbar items are grayed out). Let's take a look at how this was configured in Ranger. The nifiadmin user has full access to NiFi, so has read and write access to all resources. Since the hadoopadmin user has more restrictive access, we'll configure separate policies in Ranger for this user. Firstly, hadoopadmin will need read and write access to the /flow resource in order to access the UI and modify any dataflows. Secondly, this user needs a policy for the root Hadoop Team process group. In order to configure this, we need to capture the globally unique identifier, or GUID, associated with this process group, which is visible and can be copied from the NiFi UI. The Ranger policy will provide read and write access to this process group within the /process-groups resource. Notice that the hadoopadmin can modify the dataflow within the Hadoop Team process group (the toolbar items are not grayed out and new processors can be dragged and dropped onto the canvas).

slachterman · ‎10-06-2016

Yes, @Houssam Manik the values are configurable in the Ambari UI. Please accept this answer if it helps to address this question for you.

slachterman · ‎10-06-2016

@Houssam Manik this is configurable within the Ranger User Sync configuration. In particular, the User Configs tab contains the User Group Name Attribute setting (which defaults to memberof,ismemberof) and the Group Configs tab contains the Group Filter settings (which defaults to uniqueMember={0}, where the substituted parameter is the full distinguished name of the user). Please see this doc. The LDAP Connection Check Tool is helpful when configuring LDAP properties for Ranger User Sync.

slachterman · ‎09-30-2016

When running a distcp process from HDFS to AWS S3, credentials are required to authenticate to the S3 bucket. Passing these into the S3A URI would leak secret values into application logs. Storing these secrets in core-site.xml is also not ideal because this means any user with hdfs CLI access can access the S3 bucket to which these AWS credentials are tied. The Hadoop Credential API can be used to manage access to S3 in a more fine-grained way. The first step is to create a local JCEKS file in which to store the AWS Access Key and AWS Secret Key values: hadoop credential create fs.s3a.access.key -provider localjceks://file/path/to/aws.jceks <enter Access Key value at prompt> hadoop credential create fs.s3a.secret.key -provider localjceks://file/path/to/aws.jceks <enter Secret Key value at prompt> We'll then copy this JCEKS file to HDFS with the appropriate permissions. hdfs dfs -put /path/to/aws.jceks /user/admin/ hdfs dfs -chown admin:admin /user/admin/aws.jceks hdfs dfs -chmod 400 /user/admin/aws.jceks We can then use the credential provider when calling hadoop distcp, as follows: hadoop distcp -Dhadoop.security.credential.provider.path=jceks://hdfs/user/admin/aws.jceks /user/admin/file s3a://my-bucket/ Notice that only the admin user can read this credentials file. If other users attempt to run the command above they will receive a permissions error because they can't read aws.jceks. This also works with hdfs commands, as in the below example. hdfs dfs -Dhadoop.security.credential.provider.path=jceks://hdfs/user/admin/aws.jceks -ls s3a://my-bucket

slachterman · ‎09-26-2016

Do you have zeppelin.server.addr set to the actual IP or host of the Zeppelin server?

Online	Offline
Last Visited	‎05-03-2018 08:43 PM

Member Since	‎06-20-2016 02:58 PM
Last Visited	‎05-03-2018 08:43 PM
Posts	251
Kudos received	196

Cloudera Community

Re: PySpark and Python version (<3.6)?

Re: Ambari Server Start failure - Ranger Atlas Ta...

Re: Using underscore _ in a database name in HIVE

Re: Active directory as Directory Service and MIT ...

Re: 4 node cluster configuration

Re: how to create a policy that prohibits combinat...

Connecting to Hive via Knox from Tableau

Re: NIFI - ListSFTP / FETCHSFTP / PUTHDFS

Re: NIFI - ListSFTP / FETCHSFTP / PUTHDFS

Re: Using Hadoop Credential API to store AWS secre...

NiFi Multitenant Authorization when using Ranger P...

Re: Rangersync with LDAP : user lookup criteria

Re: Rangersync with LDAP : user lookup criteria

Using Hadoop Credential API to store AWS secrets

Re: HDP 2.5 + Zeppelin 0.6 + LDAP : Interpreters a...