Created on 07-15-2016 11:35 AM
su hdfs hadoop fs -mkdir /udf hadoop fs -put urldetector-1.0-jar-with-dependencies.jar /udf/ hadoop fs -put libs/url-detector-0.1.15.jar /udf/ hadoop fs -chown -R hdfs /udf hadoop fs -chgrp -R hdfs /udf hadoop fs -chmod -R 775 /udf
Create Hadoop Directories and upload the two necessary libraries.
CREATE FUNCTION urldetector as 'com.dataflowdeveloper.detection.URLDetector' USING JAR 'hdfs:///udf/urldetector-1.0-jar-with-dependencies.jar', JAR 'hdfs:///udf/url-detector-0.1.15.jar';
Create Hive Function with those HDFS referenced JARs
select http_user_agent,urldetector(remote_host)asurls,remote_host from AccessLogs limit 100;
Test the UDF via Hive QL
@Description(name="urldetector", value="_FUNC_(string) - detectsurls") public final class URLDetector extends UDF{}
Java Header for the UDF
set hive.cli.print.header=true; add jar urldetector-1.0-jar-with-dependencies.jar;CREATE TEMPORARY FUNCTION urldetector as 'com.dataflowdeveloper.detection.URLDetector';select urldetector(description) from sample_07 limit 100;
You can test with a temporary function through Hive CLI before making the function permanent.
mvn compile assembly:single
Build the Jar File for Deployment
The library from LinkedIn (https://github.com/linkedin/URL-Detector) must be compiled and the JAR used in your code and deployed to Hive.
References
See: https://github.com/tspannhw/URLDetector for full source code.
User | Count |
---|---|
758 | |
379 | |
316 | |
309 | |
268 |