Member since
11-03-2015
2
Posts
2
Kudos Received
0
Solutions
06-30-2016
09:10 PM
1 Kudo
Abstract This article will focus on creating a custom HIVE UDF in the Scala programming language. Intellij IDEA 2016 was used to create the project and artifacts. Creation and testing of the UDF was performed on the Hortonworks Sandbox 2.4 using Oracle Virtual Box. The full source code for the project can be found here. Create project. Using Intellij IDEA, create a new project with the following configuration. Project type: Scala Project name: shoehorn Project SDK: 1.8 (java version 1.8) Scala SDK: scala-sdk-2.11.8 Create package Once the project is created, add a new package under /shoehorn/src/ named: udf Set dependencies Edit your project structure and add the hive-exec-1.2.1.jar file to the module dependencies. Create Artifact Edit your project structure and add and artifact of type JAR>module with dependencies. For this example, I am adding the artifact to the shoehorn module. Create Scala class The first thing we need to do is create a Scala class referencing the org.apache.hadoop.hive.ql.exec.UDF library. For this example, the class name is ScalaUDF. package udf
import org.apache.hadoop.hive.ql.exec.UDF
class ScalaUDF extends UDF {
}
Define function Now we add our function definition inside the ScalaUDF definition. For this example, I'm creating a simple function that takes an input column of string type and returns the length of that string. This function is for demonstration purposes only as there is already a Hive function that provides the same functionality. package udf
import org.apache.hadoop.hive.ql.exec.UDF
class ScalaUDF extends UDF {
def evaluate(str: String): Int = {
str.length()
}
} Create artifact Using Intellij IDEA select Build>"Make Project" from the file menu. Next, select Build>"Build Artifacts...". This will create the /shoehorn/out/artifacts/shoehorn_jar/shoehorn.jar file. Create Hive UDF Upload the shoehorn.jar file to HDFS. You may need to change the file permissions depending on which user will be executing Hive commands. For this example, I've uploaded the file to my local Hortonworks Sandbox in the location: hdfs:///jars/shoehorn.jar In hive, run the following command to register a new udf. Note: This can be done in the Hive view in Ambari or through the Hive CLI. create function getScalaLength as 'udf.ScalaUDF' using jar 'hdfs:///jars/shoehorn.jar'; Finally, we can test our udf using the following HQL in Hive. select phone_number, getScalaLength(phone_number) from xademo.customer_details limit 5; The result set returned: PHONE_NUM 9
5553947406 10
7622112093 10
5092111043 10
9392254909 10
Time taken: 6.38 seconds, Fetched: 5 row(s)
... View more
Labels: