Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Cloudera Employee

 

In this article, I will document how to use CFM 1.0.1.0 to interact with Apache Impala. This article still applies if using HDF / Apache NiFi

 

The latest official JDBC driver that will work when using NiFi is the  JDBC driver 2.6.4 or below.

At the time of this writing, any driver above that causes class conflicts with the NiFi JVM and the driver's own use of log4j.

 

Pre-requisite

 

Downloading and extracting the JDBC driver:

 

  1. The JDBC drivers can be found at Impala JDBC Connector 2.6.15 for Cloudera Enterprise.
  2. Select 2.6.4 * or if in the future a version higher than 2.6.15 is available, use that.
  3. Download and extract 2.6.4 and make note of where it extracts to.
  4. Ensure that the user that runs the NiFi JVM ( nifi ) has permission to that path.
  5. The jar file that you will use is called ImpalaJDBC41.jar.

 

Create Impala table and load dataset sample to HDFS:

 

  1. Use this data set tips.csv and add it to your HDFS.
    hdfs dfs -put data/tips.csv /user/hive/warehouse/tips/
  2. Create your impala table:
impala-shell -i <impala_daemon_hostname>:21000 -q '
  CREATE TABLE default.tips (
    `total_bill` FLOAT,
    `tip` FLOAT,
    `sex` STRING,
    `smoker` STRING,
    `day` STRING,
    `time` STRING,
    `size` TINYINT)
  ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
  LOCATION "hdfs:///user/hive/warehouse/tips/";'

 * These steps were taken from this article.

 

Configure the Nifi to interact with Impala:

On NiFi drag processor ExecuteSQL

Screen Shot 2020-04-17 at 10.25.47 AM.png

 

Configure Database Connection Pooling Service on the ExecuteSQL processor

This is a pointer to the DBCPConnectionPool controller service that you will need to configure:

 

The driver documentation is really good at explaining the different settings you can pass. If you will interact with an Impala that is TLS secured and / or Kerberos there are options for that. In my example, I am interacting with a TLS and Kerberized Impala.

 

On the controller service section configure your DBCPConnectionPool and add the following:

  • Database Connection URL

         My example: 

 

jdbc:impala://YourImpalaHostFQDN:YourPort

 

  • Database Driver Class Name

 

com.cloudera.impala.jdbc41.Driver

 

  • Database Driver Location(s)

 The following is the path to the JDBC driver (ImpalaJDBC41.jar) you downloaded earlier:

Screen Shot 2020-04-17 at 10.52.17 AM.png

 

Back in the ExecuteSQL processor, add your SQL command. For this example, we are running a simple select query. By configuring SQL select query = select * from default.tips

Screen Shot 2020-04-17 at 10.54.54 AM.png

 

That should be all you need.

If interacting with a TLS and / or Kerberos Impala, then you will need to look at the driver documentation for the options that apply to you. For reference, my connect string looked like below when connecting to a TLS and Kerberos Impala:

 

jdbc:impala://MyImpalaHost:21050;AuthMech=1;KrbHostFQDN=MyImpalaHostFQDN;KrbServiceName=impala;ssl=1;SSLTrustStore=/My/JKS/Trustore;SSLTrustStorePwd=YourJKSPassword

 

399 Views
0 Kudos
Tags (3)
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
5 of 5
Last update:
‎04-30-2020 12:47 AM
Updated by:
 
Top Kudoed Authors