Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
avatar
Master Collaborator

In this article, we will learn how to register a Hive UDFs using Spark HiveWarehouseSession.

  1. Download and build the Spark Hive UDF example.
    git clone https://github.com/rangareddy/spark-hive-udf
    cd spark-hive-udf
    mvn clean package -DskipTests
  2. Copy the target/spark-hive-udf-1.0.0-SNAPSHOT.jar to the edge node.
  3. Login to edge node and upload the spark-hive-udf-1.0.0-SNAPSHOT.jar to HDFS location for example, /tmp.
    hdfs dfs -put ./brickhouse-0.7.1-SNAPSHOT.jar /tmp
  4. Launch the spark-shell with 'hwc' parameters.
    spark-shell \
      --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly-*.jar \
      --conf spark.sql.hive.hiveserver2.jdbc.url='jdbc:hive2://hiveserver2_host1:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2' \
      --conf spark.sql.hive.hwc.execution.mode=spark \
      --conf spark.datasource.hive.warehouse.metastoreUri='thrift://metastore_host:9083' \
      --conf spark.datasource.hive.warehouse.load.staging.dir='/tmp' \
      --conf spark.datasource.hive.warehouse.user.name=hive \
      --conf spark.datasource.hive.warehouse.password=hive \
      --conf spark.datasource.hive.warehouse.smartExecution=false \
      --conf spark.datasource.hive.warehouse.read.via.llap=false \
      --conf spark.datasource.hive.warehouse.read.jdbc.mode=cluster \
      --conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V2 \
      --conf spark.security.credentials.hiveserver2.enabled=false \
      --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions
  5. Create the HiveWarehouseSession.
    import com.hortonworks.hwc.HiveWarehouseSession
    import com.hortonworks.hwc.HiveWarehouseSession._
    
    val hive = HiveWarehouseSession.session(spark).build()
  6. Execute the following statement to register a Hive UDF.
    hive.executeUpdate("CREATE FUNCTION uppercase AS 'com.ranga.spark.hive.udf.UpperCaseUDF' USING JAR 'hdfs:///tmp/spark-hive-udf-1.0.0-SNAPSHOT.jar'")
  7. Test the registered function, for example, uppercase.
    scala> val data1 = hive.executeQuery("select id, uppercase(name), age, salary from employee")
    
    scala> data1.show()
    +---+-----------------------+---+---------+
    | id|default.uppercase(name)|age| salary|
    +---+-----------------------+---+---------+
    | 1| RANGA| 32| 245000.3|
    | 2| NISHANTH| 2| 345000.1|
    | 3| RAJA| 32|245000.86|
    | 4| MANI| 14| 45000.0|
    +---+-----------------------+---+---------+

Thanks for reading this article.

778 Views
0 Kudos