Community Articles

subratadas · ‎08-24-2021

In this article, we will learn how to register a Hive UDFs using Spark HiveWarehouseSession.

Download and build the Spark Hive UDF example.

git clone https://github.com/rangareddy/spark-hive-udf
cd spark-hive-udf
mvn clean package -DskipTests

Copy the target/spark-hive-udf-1.0.0-SNAPSHOT.jar to the edge node.
Login to edge node and upload the spark-hive-udf-1.0.0-SNAPSHOT.jar to HDFS location for example, /tmp.
```
hdfs dfs -put ./brickhouse-0.7.1-SNAPSHOT.jar /tmp
```

Launch the spark-shell with 'hwc' parameters.

spark-shell \
  --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly-*.jar \
  --conf spark.sql.hive.hiveserver2.jdbc.url='jdbc:hive2://hiveserver2_host1:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2' \
  --conf spark.sql.hive.hwc.execution.mode=spark \
  --conf spark.datasource.hive.warehouse.metastoreUri='thrift://metastore_host:9083' \
  --conf spark.datasource.hive.warehouse.load.staging.dir='/tmp' \
  --conf spark.datasource.hive.warehouse.user.name=hive \
  --conf spark.datasource.hive.warehouse.password=hive \
  --conf spark.datasource.hive.warehouse.smartExecution=false \
  --conf spark.datasource.hive.warehouse.read.via.llap=false \
  --conf spark.datasource.hive.warehouse.read.jdbc.mode=cluster \
  --conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V2 \
  --conf spark.security.credentials.hiveserver2.enabled=false \
  --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions

Create the HiveWarehouseSession.

import com.hortonworks.hwc.HiveWarehouseSession
import com.hortonworks.hwc.HiveWarehouseSession._

val hive = HiveWarehouseSession.session(spark).build()

Execute the following statement to register a Hive UDF.

hive.executeUpdate("CREATE FUNCTION uppercase AS 'com.ranga.spark.hive.udf.UpperCaseUDF' USING JAR 'hdfs:///tmp/spark-hive-udf-1.0.0-SNAPSHOT.jar'")

Test the registered function, for example, uppercase.

scala> val data1 = hive.executeQuery("select id, uppercase(name), age, salary from employee")

scala> data1.show()
+---+-----------------------+---+---------+
| id|default.uppercase(name)|age| salary|
+---+-----------------------+---+---------+
| 1| RANGA| 32| 245000.3|
| 2| NISHANTH| 2| 345000.1|
| 3| RAJA| 32|245000.86|
| 4| MANI| 14| 45000.0|
+---+-----------------------+---+---------+

Thanks for reading this article.

Cloudera Community

Community Articles

Spark HWC integration with Hive UDFs

Apache Hive

Apache Spark