- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on
04-17-2020
08:07 AM
- edited on
04-30-2020
12:47 AM
by
VidyaSargur
In this article, I will document how to use CFM 1.0.1.0 to interact with Apache Impala. This article still applies if using HDF / Apache NiFi
The latest official JDBC driver that will work when using NiFi is the JDBC driver 2.6.4 or below.
At the time of this writing, any driver above that causes class conflicts with the NiFi JVM and the driver's own use of log4j.
Pre-requisite
Downloading and extracting the JDBC driver:
- The JDBC drivers can be found at Impala JDBC Connector 2.6.15 for Cloudera Enterprise.
- Select 2.6.4 * or if in the future a version higher than 2.6.15 is available, use that.
- Download and extract 2.6.4 and make note of where it extracts to.
- Ensure that the user that runs the NiFi JVM ( nifi ) has permission to that path.
- The jar file that you will use is called ImpalaJDBC41.jar.
Create Impala table and load dataset sample to HDFS:
- Use this data set tips.csv and add it to your HDFS.
hdfs dfs -put data/tips.csv /user/hive/warehouse/tips/
- Create your impala table:
impala-shell -i <impala_daemon_hostname>:21000 -q ' CREATE TABLE default.tips ( `total_bill` FLOAT, `tip` FLOAT, `sex` STRING, `smoker` STRING, `day` STRING, `time` STRING, `size` TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY "," LOCATION "hdfs:///user/hive/warehouse/tips/";'
* These steps were taken from this article.
Configure the Nifi to interact with Impala:
On NiFi drag processor ExecuteSQL
Configure Database Connection Pooling Service on the ExecuteSQL processor
This is a pointer to the DBCPConnectionPool controller service that you will need to configure:
The driver documentation is really good at explaining the different settings you can pass. If you will interact with an Impala that is TLS secured and / or Kerberos there are options for that. In my example, I am interacting with a TLS and Kerberized Impala.
On the controller service section configure your DBCPConnectionPool and add the following:
- Database Connection URL
My example:
jdbc:impala://YourImpalaHostFQDN:YourPort
- Database Driver Class Name
com.cloudera.impala.jdbc41.Driver
- Database Driver Location(s)
The following is the path to the JDBC driver (ImpalaJDBC41.jar) you downloaded earlier:
Back in the ExecuteSQL processor, add your SQL command. For this example, we are running a simple select query. By configuring SQL select query = select * from default.tips
That should be all you need.
If interacting with a TLS and / or Kerberos Impala, then you will need to look at the driver documentation for the options that apply to you. For reference, my connect string looked like below when connecting to a TLS and Kerberos Impala:
jdbc:impala://MyImpalaHost:21050;AuthMech=1;KrbHostFQDN=MyImpalaHostFQDN;KrbServiceName=impala;ssl=1;SSLTrustStore=/My/JKS/Trustore;SSLTrustStorePwd=YourJKSPassword
Created on 03-09-2021 08:43 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Tested. Works. Awesome.
Created on 08-22-2022 09:53 AM - edited 08-22-2022 09:55 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Does anyone know if there is a way to use impala over TLS that doesn't require you putting passwords in cleartext? (CFM 2.0.4)
Created on 07-04-2024 12:01 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Can i use multiple impala demons in connection string, or is there any way to use multiple impala demons ? CDP 7.1.7