Create Hadoop Directories and upload the two necessary libraries.
CREATE FUNCTION urldetector as 'com.dataflowdeveloper.detection.URLDetector' USING JAR 'hdfs:///udf/urldetector-1.0-jar-with-dependencies.jar', JAR 'hdfs:///udf/url-detector-0.1.15.jar';
Create Hive Function with those HDFS referenced JARs
select http_user_agent,urldetector(remote_host)asurls,remote_host from AccessLogs limit 100;
Test the UDF via Hive QL
@Description(name="urldetector", value="_FUNC_(string) - detectsurls")
public final class URLDetector extends UDF{}
Java Header for the UDF
set hive.cli.print.header=true;
add jar urldetector-1.0-jar-with-dependencies.jar;CREATE TEMPORARY FUNCTION urldetector as 'com.dataflowdeveloper.detection.URLDetector';select urldetector(description) from sample_07 limit 100;
You can test with a temporary function through Hive CLI before making the function permanent.
mvn compile assembly:single
Build the Jar File for Deployment
The library from LinkedIn (https://github.com/linkedin/URL-Detector) must be compiled and the JAR used in your code and deployed to Hive.