Community Articles

Find and share helpful community-sourced technical articles.
avatar
Expert Contributor

 

In this article, I will document how to use CFM 1.0.1.0 to interact with Apache Impala. This article still applies if using HDF / Apache NiFi

 

The latest official JDBC driver that will work when using NiFi is the  JDBC driver 2.6.4 or below.

At the time of this writing, any driver above that causes class conflicts with the NiFi JVM and the driver's own use of log4j.

 

Pre-requisite

 

Downloading and extracting the JDBC driver:

 

  1. The JDBC drivers can be found at Impala JDBC Connector 2.6.15 for Cloudera Enterprise.
  2. Select 2.6.4 * or if in the future a version higher than 2.6.15 is available, use that.
  3. Download and extract 2.6.4 and make note of where it extracts to.
  4. Ensure that the user that runs the NiFi JVM ( nifi ) has permission to that path.
  5. The jar file that you will use is called ImpalaJDBC41.jar.

 

Create Impala table and load dataset sample to HDFS:

 

  1. Use this data set tips.csv and add it to your HDFS.
    hdfs dfs -put data/tips.csv /user/hive/warehouse/tips/
  2. Create your impala table:
impala-shell -i <impala_daemon_hostname>:21000 -q '
  CREATE TABLE default.tips (
    `total_bill` FLOAT,
    `tip` FLOAT,
    `sex` STRING,
    `smoker` STRING,
    `day` STRING,
    `time` STRING,
    `size` TINYINT)
  ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
  LOCATION "hdfs:///user/hive/warehouse/tips/";'

 * These steps were taken from this article.

 

Configure the Nifi to interact with Impala:

On NiFi drag processor ExecuteSQL

Screen Shot 2020-04-17 at 10.25.47 AM.png

 

Configure Database Connection Pooling Service on the ExecuteSQL processor

This is a pointer to the DBCPConnectionPool controller service that you will need to configure:

 

The driver documentation is really good at explaining the different settings you can pass. If you will interact with an Impala that is TLS secured and / or Kerberos there are options for that. In my example, I am interacting with a TLS and Kerberized Impala.

 

On the controller service section configure your DBCPConnectionPool and add the following:

  • Database Connection URL

         My example: 

 

jdbc:impala://YourImpalaHostFQDN:YourPort

 

  • Database Driver Class Name

 

com.cloudera.impala.jdbc41.Driver

 

  • Database Driver Location(s)

 The following is the path to the JDBC driver (ImpalaJDBC41.jar) you downloaded earlier:

Screen Shot 2020-04-17 at 10.52.17 AM.png

 

Back in the ExecuteSQL processor, add your SQL command. For this example, we are running a simple select query. By configuring SQL select query = select * from default.tips

Screen Shot 2020-04-17 at 10.54.54 AM.png

 

That should be all you need.

If interacting with a TLS and / or Kerberos Impala, then you will need to look at the driver documentation for the options that apply to you. For reference, my connect string looked like below when connecting to a TLS and Kerberos Impala:

 

jdbc:impala://MyImpalaHost:21050;AuthMech=1;KrbHostFQDN=MyImpalaHostFQDN;KrbServiceName=impala;ssl=1;SSLTrustStore=/My/JKS/Trustore;SSLTrustStorePwd=YourJKSPassword

 

5,947 Views
Comments
avatar
Contributor

Tested. Works. Awesome.

avatar
Explorer

Does anyone know if there is a way to use impala over TLS that doesn't require you putting passwords in cleartext? (CFM 2.0.4)

avatar
New Contributor

Can i use multiple impala demons in connection string, or is there any way to use multiple impala demons ? CDP 7.1.7