Community Articles

VidyaSargur · ‎05-05-2020

This article explains how to setup Hive Warehouse Connector (HWC), in CDP Public Cloud (tested with CDP Public Cloud Datahub runtime 7.1).

Note: you must have admin privileges on your datahub cluster do this configuration.

Step 1: Whitelist the path to the HWC jar

In your CDP Datahub, open Cloudera Manager (CM) via the link in the management console:
In CM, go to your cluster > Livy > Configuration, and search for livy-conf/livy.conf:
Add the following safety valve to the configuration file:
```
livy.file.local-dir-whitelist=/path_for_hwc/
```
In our example, we are using the /tmp/ folder.
Restart the Livy service via CM to propagate the configuration.

Step 2: Copy the HWC jar to the whitelisted location

Find the hostname of the node where Livy is installed (master3 here):

Connect to the node using your user/workload password, e.g.:

ssh pvidal@viz-data-engineering-master3.viz-cdp.a465-9q4k.cloudera.site

Find the HWC jar:

[pvidal@viz-data-engineering-master3 /]$ find / -name *hive-warehouse-connector* 2>/dev/null
./opt/cloudera/parcels/CDH-7.1.0-1.cdh7.1.0.p0.1922354/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.1.0.0-714.jar
./opt/cloudera/parcels/CDH-7.1.0-1.cdh7.1.0.p0.1922354/jars/hive-warehouse-connector-assembly-1.0.0.7.1.0.0-714.jar

Copy it and add the right permissions:

cp /opt/cloudera/parcels/CDH-7.1.0-1.cdh7.1.0.p0.1922354/jars/hive-warehouse-connector-assembly-1.0.0.7.1.0.0-714.jar /tmp
chmod a+rw /tmp/hive-warehouse-connector-assembly-1.0.0.7.1.0.0-714.jar

Step 3: Add jar path to Zeppelin Livy interpreter

From your management console, open Zeppelin:
Go to the top right, and configure your Interpreters:

Edit the livy interpreter and add the following properties:
HWC Jar Location

Name: livy.spark.jars
Value: file:///[LOCATION_OF_YOUR_HWC_JAR]

Hive JDBC URL

Name: livy.spark.sql.hive.hiveserver2.jdbc.url
Value: [JDBC_URL_FROM_MANAGEMENT_CONSOLE];user=[your_user];password=[your_password]

You can find the JDBC URL in your datahub management console:
Screen Shot 2020-05-05 at 7.33.15 PM.png

Read via LLAP

Name: livy.spark.datasource.hive.warehouse.read.via.llap
Value: false

JDBC mode

Name: livy.spark.datasource.hive.warehouse.read.jdbc.mode
Value: client

Staging Dir

Name: livy.spark.datasource.hive.warehouse.load.staging.dir
Value: /tmp

Metastore URI

Name: livy.spark.datasource.hive.warehouse.metastoreUri
Value: [VALUE_FROM_HIVE_SITE_XML]

You can download hive-site.xml from CM, by going to Your Cluster > Hive > Download Client Configuration:
Save your configuration, and restart your interpreter.

Step 4: Code away

Here is a simple example of Spark reading from a CSV and writing to a Hive table using HWC:

Read raw location data

val locationDf = spark.read.options(Map("inferSchema"->"true","delimiter"->",","header"->"true")).csv("s3a://viz-cdp-bucket/raw/locations.csv")
locationDf.printSchema()

Setup HWC session

import com.hortonworks.hwc.HiveWarehouseSession
import com.hortonworks.hwc.HiveWarehouseSession._
val hive = HiveWarehouseSession.session(spark).build()

Create database and save dataset to table

hive.executeUpdate("CREATE DATABASE worldwidebank");
hive.setDatabase("worldwidebank");
locationDf.write.format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector").option("table", "locations").save()

Query data

val ds = hive.sql("select * from locations limit 10")
ds.show()

Cloudera Community

Community Articles

How to setup Hive Warehouse Connector in Zeppelin (CDP Public Cloud Datahub)

Apache Spark

Apache Zeppelin

Cloudera Data Platform (CDP)

Step 1: Whitelist the path to the HWC jar

Step 2: Copy the HWC jar to the whitelisted location

Step 3: Add jar path to Zeppelin Livy interpreter

Step 4: Code away