Member since
06-20-2016
251
Posts
196
Kudos Received
36
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9689 | 11-08-2017 02:53 PM | |
2061 | 08-24-2017 03:09 PM | |
7860 | 05-11-2017 02:55 PM | |
6508 | 05-08-2017 04:16 PM | |
1956 | 04-27-2017 08:05 PM |
12-19-2016
12:29 AM
Thanks @mqureshi, I was testing with impersonation turned off for the Livy interpreter. The curl test was just to confirm that the zeppelin service could authenticate to the Livy REST API using SPNEGO. My assumption with impersonation turned off is that Livy would launch the Spark application as the livy principal. Interestingly, with impersonation enabled, I am seeing a different error:
java.net.ConnectException: Connection refused (Connection refused). By the way, behavior is the same with interpreter specified as 'livy.spark'.
... View more
12-18-2016
10:11 PM
I am attempting to use the Livy interpreter in Zeppelin in a Kerberized cluster running HDP 2.5. I am seeing Error running rest call; nested exception is org.springframework.web.client.HttpClientErrorException: 403 Forbidden in the UI, but don't see additional information in the zeppelin or livy logs. This seems to be a SPNEGO authentication issue of some kind. I tried using curl to connect from the Zeppelin node and was able to authenticate using --negotiate with the ticket in the cache. How can I troubleshoot this error further?
... View more
Labels:
- Labels:
-
Apache Zeppelin
12-15-2016
09:42 PM
12 Kudos
Hortonworks Data Flow 2.1 was recently released and includes a new feature which can be used to connect to an Azure Data Lake Store. This is a fantastic use case for HDF as the data movement engine supporting a connected data plane architecture spanning on-premise and cloud deployments. This how-to will assume that you have created an Azure Data Lake Store account and that you have remote access to an HD Insights head node in order to retrieve some dependent JARs.
We will make use of the new Additional Classpath Resources feature for the GetHdfs and PutHdfs processors in NiFi 1.1, included within HDF 2.1. The following additional dependencies are required for ADLS connectivity:
adls2-oauth2-token-provider-1.0.jar
azure-data-lake-store-sdk-2.0.4-SNAPSHOT.jar
hadoop-azure-datalake-2.0.0-SNAPSHOT.jar
jackson-core-2.2.3.jar
okhttp-2.4.0.jar
okio-1.4.0.jar
The first three Azure-specific JARs can be found in /usr/lib/hdinsight-datalake/ on the HDI head node. The Jackson JAR can be found in /usr/hdp/current/hadoop-client/lib, and the last two can be found in /usr/hdp/current/hadoop-hdfs-client/lib .
Once you've gathered these JARs, distribute to all NiFi nodes and place in a created directory /usr/lib/hdinsight-datalake.
In order to authenticate to ADLS, we'll use OAuth2. This requires the TenantID associated with your Azure account. This simplest way to obtain this is via the Azure CLI, using the azure account show command.
You will also need to create an Azure AD service principal as well as an associated key. Navigate to Azure AD > App Registrations > Add
Take note of the Application ID (aka the Client ID) and then generate a key via the Keys blade (please note the Client Secret value will be Hidden after leaving this blade so be sure to copy somewhere safe and store securely).
The service principal associated with this application will need to have service-level authorization to access the Azure Data Lake Store instance that exists by assumption as a pre-requisite. This can be done via the IAM blade for your ADLS instance (please note you will not see the Add button in the top toolbar unless you have administrative access for your Azure subscription).
In addition, the service principal will need to have appropriate directory-level authorizations for the ADLS directories to which it should be authorized to read or write. These can be assigned via Data Explorer > Access within your ADLS instance.
At this point, you should have your TenantID, ClientID, and Client Secret available and we will now to be able to configure core-site.xml in order to access Azure Data Lake via the PutHdfs processor.
The important core-site values are as follows (note the variables identified with the '$' sigil below, including part of the refresh URL path).
<property>
<name>dfs.adls.oauth2.access.token.provider.type</name>
<value>ClientCredential</value>
</property>
<property>
<name>dfs.adls.oauth2.refresh.url</name>
<value>https://login.microsoftonline.com/$YOUR_TENANT_ID/oauth2/token</value>
</property>
<property>
<name>dfs.adls.oauth2.client.id</name>
<value>$YOUR_CLIENT_ID</value>
</property>
<property>
<name>dfs.adls.oauth2.credential</name>
<value>$YOUR_CLIENT_SECRET</value>
</property>
<property>
<name>fs.AbstractFileSystem.adl.impl</name>
<value>org.apache.hadoop.fs.adl.Adl</value>
</property>
<property>
<name>fs.adl.impl</name>
<value>org.apache.hadoop.fs.adl.AdlFileSystem</value>
</property>
We're now ready to configure the PutHdfs processor in NiFi.
For Hadoop configuration resources, point to your modified core-site.xml including the properties above and an hdfs-site.xml (no ADLS-specific changes are required).
Additional Classpath Resources should point to the /usr/lib/hdinsight-datalake to which we copied the dependencies on all NiFi nodes.
The input to this PutHdfs processor can be any FlowFile, it may be simplest to use the GenerateFlowFile processor to create the input with some Custom Text such as
The time is ${now()}
When you run the data flow, you should see the FlowFiles appear in the ADLS directory specified in the processor, which you can verify using the Data Explorer in the Azure Portal, or via some other means.
... View more
Labels:
12-15-2016
06:31 PM
@cduby you'll need to log in to Ambari as a user that has access to Manage Users and Groups, like the admin user. It just matches on the username string (noting that mapping rules may modify that value), based on the authenticated user (the Hive view makes use of impersonation, for which ever system user is running Ambari Server). Best practice is to use LDAP for both Ambari and Ranger, pointing to the same LDAP, so that both systems use the same source of truth for user and group identities.
... View more
12-15-2016
03:02 PM
@cduby that's expected behavior. Internal Ranger users can log into the Ranger UI, depending on their permissions, (and have Ranger policies assigned to them), but not necessarily the Ambari UI. Ambari has its own local users that are stored in Ambari's database. Ranger syncing external users from Unix doesn't affect this.
... View more
12-12-2016
03:54 PM
Hi @Gerd Koenig, please see my linked HCC article in the parent comment. The template XML is attached to that post. Essentially, the ReplaceText processor will fail, so FlowFiles that contain an incomplete JSON record will get routed to the PutFile processor within the exception flow.
... View more
12-12-2016
01:01 AM
@aengineer I saw this consistently as well when creating this HCC article. It seems like the Ranger plugin isn't always writing complete records for the last record in the file. In the NiFi flow described in that article, I just dropped these invalid records as this was appropriate for the purposes of the analysis in question.
... View more
12-08-2016
06:27 PM
@Sami Ahmad because this version of the command uses the keytab. With Keberos, access to the keytab file is equivalent to knowledge of the password. Please see https://web.mit.edu/kerberos/krb5-1.12/doc/basic/keytab_def.html Please accept this answer if it was helpful in resolving your issue.
... View more
12-08-2016
06:26 PM
The Sandbox team at Hortonworks made this decision for the following reasons: 1) Customers are asking for it. Rate of adoption is increasing 30% in past year 2) Saves development time in building one image vs. 3 different images 3) Better consistency amongst virtualbox, azure, vmware
... View more