Created on 08-24-2021 02:22 AM - edited on 08-24-2021 08:34 PM by subratadas
In this article, we will learn how to register a Hive UDFs using Spark HiveWarehouseSession.
git clone https://github.com/rangareddy/spark-hive-udf cd spark-hive-udf mvn clean package -DskipTests
hdfs dfs -put ./brickhouse-0.7.1-SNAPSHOT.jar /tmp
spark-shell \ --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly-*.jar \ --conf spark.sql.hive.hiveserver2.jdbc.url='jdbc:hive2://hiveserver2_host1:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2' \ --conf spark.sql.hive.hwc.execution.mode=spark \ --conf spark.datasource.hive.warehouse.metastoreUri='thrift://metastore_host:9083' \ --conf spark.datasource.hive.warehouse.load.staging.dir='/tmp' \ --conf spark.datasource.hive.warehouse.user.name=hive \ --conf spark.datasource.hive.warehouse.password=hive \ --conf spark.datasource.hive.warehouse.smartExecution=false \ --conf spark.datasource.hive.warehouse.read.via.llap=false \ --conf spark.datasource.hive.warehouse.read.jdbc.mode=cluster \ --conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V2 \ --conf spark.security.credentials.hiveserver2.enabled=false \ --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions
import com.hortonworks.hwc.HiveWarehouseSession import com.hortonworks.hwc.HiveWarehouseSession._ val hive = HiveWarehouseSession.session(spark).build()
hive.executeUpdate("CREATE FUNCTION uppercase AS 'com.ranga.spark.hive.udf.UpperCaseUDF' USING JAR 'hdfs:///tmp/spark-hive-udf-1.0.0-SNAPSHOT.jar'")
scala> val data1 = hive.executeQuery("select id, uppercase(name), age, salary from employee") scala> data1.show() +---+-----------------------+---+---------+ | id|default.uppercase(name)|age| salary| +---+-----------------------+---+---------+ | 1| RANGA| 32| 245000.3| | 2| NISHANTH| 2| 345000.1| | 3| RAJA| 32|245000.86| | 4| MANI| 14| 45000.0| +---+-----------------------+---+---------+
Thanks for reading this article.