About slachterman

slachterman · ‎12-19-2016

Thanks @mqureshi, I was testing with impersonation turned off for the Livy interpreter. The curl test was just to confirm that the zeppelin service could authenticate to the Livy REST API using SPNEGO. My assumption with impersonation turned off is that Livy would launch the Spark application as the livy principal. Interestingly, with impersonation enabled, I am seeing a different error: java.net.ConnectException: Connection refused (Connection refused). By the way, behavior is the same with interpreter specified as 'livy.spark'.

slachterman · ‎12-18-2016

I am attempting to use the Livy interpreter in Zeppelin in a Kerberized cluster running HDP 2.5. I am seeing Error running rest call; nested exception is org.springframework.web.client.HttpClientErrorException: 403 Forbidden in the UI, but don't see additional information in the zeppelin or livy logs. This seems to be a SPNEGO authentication issue of some kind. I tried using curl to connect from the Zeppelin node and was able to authenticate using --negotiate with the ticket in the cache. How can I troubleshoot this error further?

slachterman · ‎12-15-2016

Hortonworks Data Flow 2.1 was recently released and includes a new feature which can be used to connect to an Azure Data Lake Store. This is a fantastic use case for HDF as the data movement engine supporting a connected data plane architecture spanning on-premise and cloud deployments. This how-to will assume that you have created an Azure Data Lake Store account and that you have remote access to an HD Insights head node in order to retrieve some dependent JARs. We will make use of the new Additional Classpath Resources feature for the GetHdfs and PutHdfs processors in NiFi 1.1, included within HDF 2.1. The following additional dependencies are required for ADLS connectivity: adls2-oauth2-token-provider-1.0.jar azure-data-lake-store-sdk-2.0.4-SNAPSHOT.jar hadoop-azure-datalake-2.0.0-SNAPSHOT.jar jackson-core-2.2.3.jar okhttp-2.4.0.jar okio-1.4.0.jar The first three Azure-specific JARs can be found in /usr/lib/hdinsight-datalake/ on the HDI head node. The Jackson JAR can be found in /usr/hdp/current/hadoop-client/lib, and the last two can be found in /usr/hdp/current/hadoop-hdfs-client/lib . Once you've gathered these JARs, distribute to all NiFi nodes and place in a created directory /usr/lib/hdinsight-datalake. In order to authenticate to ADLS, we'll use OAuth2. This requires the TenantID associated with your Azure account. This simplest way to obtain this is via the Azure CLI, using the azure account show command. You will also need to create an Azure AD service principal as well as an associated key. Navigate to Azure AD > App Registrations > Add Take note of the Application ID (aka the Client ID) and then generate a key via the Keys blade (please note the Client Secret value will be Hidden after leaving this blade so be sure to copy somewhere safe and store securely). The service principal associated with this application will need to have service-level authorization to access the Azure Data Lake Store instance that exists by assumption as a pre-requisite. This can be done via the IAM blade for your ADLS instance (please note you will not see the Add button in the top toolbar unless you have administrative access for your Azure subscription). In addition, the service principal will need to have appropriate directory-level authorizations for the ADLS directories to which it should be authorized to read or write. These can be assigned via Data Explorer > Access within your ADLS instance. At this point, you should have your TenantID, ClientID, and Client Secret available and we will now to be able to configure core-site.xml in order to access Azure Data Lake via the PutHdfs processor. The important core-site values are as follows (note the variables identified with the '$' sigil below, including part of the refresh URL path). <property> <name>dfs.adls.oauth2.access.token.provider.type</name> <value>ClientCredential</value> </property> <property> <name>dfs.adls.oauth2.refresh.url</name> <value>https://login.microsoftonline.com/$YOUR_TENANT_ID/oauth2/token</value> </property> <property> <name>dfs.adls.oauth2.client.id</name> <value>$YOUR_CLIENT_ID</value> </property> <property> <name>dfs.adls.oauth2.credential</name> <value>$YOUR_CLIENT_SECRET</value> </property> <property> <name>fs.AbstractFileSystem.adl.impl</name> <value>org.apache.hadoop.fs.adl.Adl</value> </property> <property> <name>fs.adl.impl</name> <value>org.apache.hadoop.fs.adl.AdlFileSystem</value> </property> We're now ready to configure the PutHdfs processor in NiFi. For Hadoop configuration resources, point to your modified core-site.xml including the properties above and an hdfs-site.xml (no ADLS-specific changes are required). Additional Classpath Resources should point to the /usr/lib/hdinsight-datalake to which we copied the dependencies on all NiFi nodes. The input to this PutHdfs processor can be any FlowFile, it may be simplest to use the GenerateFlowFile processor to create the input with some Custom Text such as The time is ${now()} When you run the data flow, you should see the FlowFiles appear in the ADLS directory specified in the processor, which you can verify using the Data Explorer in the Azure Portal, or via some other means.

slachterman · ‎12-15-2016

@cduby you'll need to log in to Ambari as a user that has access to Manage Users and Groups, like the admin user. It just matches on the username string (noting that mapping rules may modify that value), based on the authenticated user (the Hive view makes use of impersonation, for which ever system user is running Ambari Server). Best practice is to use LDAP for both Ambari and Ranger, pointing to the same LDAP, so that both systems use the same source of truth for user and group identities.

slachterman · ‎12-15-2016

@cduby that's expected behavior. Internal Ranger users can log into the Ranger UI, depending on their permissions, (and have Ranger policies assigned to them), but not necessarily the Ambari UI. Ambari has its own local users that are stored in Ambari's database. Ranger syncing external users from Unix doesn't affect this.

slachterman · ‎12-12-2016

Hi @Gerd Koenig, please see my linked HCC article in the parent comment. The template XML is attached to that post. Essentially, the ReplaceText processor will fail, so FlowFiles that contain an incomplete JSON record will get routed to the PutFile processor within the exception flow.

slachterman · ‎12-12-2016

@aengineer I saw this consistently as well when creating this HCC article. It seems like the Ranger plugin isn't always writing complete records for the last record in the file. In the NiFi flow described in that article, I just dropped these invalid records as this was appropriate for the purposes of the analysis in question.

slachterman · ‎12-08-2016

@Sami Ahmad because this version of the command uses the keytab. With Keberos, access to the keytab file is equivalent to knowledge of the password. Please see https://web.mit.edu/kerberos/krb5-1.12/doc/basic/keytab_def.html Please accept this answer if it was helpful in resolving your issue.

slachterman · ‎12-08-2016

Got it, thanks for the follow-up.

slachterman · ‎12-08-2016

The Sandbox team at Hortonworks made this decision for the following reasons: 1) Customers are asking for it. Rate of adoption is increasing 30% in past year 2) Saves development time in building one image vs. 3 different images 3) Better consistency amongst virtualbox, azure, vmware

Online	Offline
Last Visited	‎05-03-2018 08:43 PM

Member Since	‎06-20-2016 02:58 PM
Last Visited	‎05-03-2018 08:43 PM
Posts	251
Kudos received	196

Cloudera Community

Re: PySpark and Python version (<3.6)?

Re: Ambari Server Start failure - Ranger Atlas Ta...

Re: Using underscore _ in a database name in HIVE

Re: Active directory as Directory Service and MIT ...

Re: 4 node cluster configuration

Re: Livy HTTP 403 Error

Livy HTTP 403 Error

Connecting to Azure Data Lake from a NiFi dataflow

Re: newly created user can't log in to ambari

Re: newly created user can't log in to ambari

Re: Ranger audit to HDFS creates corrupt JSON

Re: Ranger audit to HDFS creates corrupt JSON

Re: Best practices with Ranger security

Re: Password for scp command

Re: Are there different virtual machines running