1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1841 | 04-03-2024 06:39 AM | |
| 2856 | 01-12-2024 08:19 AM | |
| 1577 | 12-07-2023 01:49 PM | |
| 2340 | 08-02-2023 07:30 AM | |
| 3223 | 03-29-2023 01:22 PM |
06-15-2016
08:35 PM
Firewalls? Is there a network between them? Does your user have full permissions to use distcp? Big data with slow network? Caused by: java.net.SocketTimeoutException: connect timed out
... View more
06-15-2016
08:33 PM
The Brickhouse Collection of UDFs from Klout includes functions for collapsing multiple rows into one, generating top K lists, a distributed cache, bloom counters, JSON functions and HBase tools. Facebook UDF Collection (HIVE-1545) including functions for unescape, find in an array and finding a max in a set of columns.
UDF Collection for Various String distances, Text classification and other Text Mining. UDF for anonymizing data with Apache Pig. Hive UDF for various functions like array count Curve Computing UDF Ngram Functions UDF Hive UDFs Similar to Oracle Funcitons A collection of UDFs for GeocodeIP, Haversine Distance, DecodeURL UDFs Hive Funnel Analysis UDF by Yahoo (tracking user conversion rates across actions) Hive UDF Collection by LivingSocial for Min and Max Date, MySQL Style Like, and more. Hive UDF with Yahoo Data sketches is for Stochastic Streaming Algorithms called Data Sketches. Hive UDF to Count Business Days. User Agent String Parser Hive UDF Date Range Generator Hive UDF
... View more
06-15-2016
07:56 PM
OneFS
support for new versions of Ambari and HDP is to visit the ECN and follow this
page, https://community.emc.com/docs/DOC-37101 . One
thing to note is that this guide is written for the 7.2.x. In 8.0.0.0 the
conmmand line structure was changed, so refer to the new CLI guide. Hadoop
starts on page 997, http://www.emc.com/collateral/TechnicalDocument/docu65065.pdf . There is a great article from EMC about HDP 2.4 on
Isilion: http://hortonworks.com/blog/hortonworks-and-emc-powering-the-future-of-data-together/
... View more
06-15-2016
07:20 PM
1 Kudo
I was going to just do a REST call to the web service used in my NiFi.
My example is on github with full scripts an source code.
So I created a semi-useful quick prototype Hive UDF in Java called ProfanityRemover that converts many non-business friendly terms into asterisks (*). It's a small list for performance purposes (like 2,000 with some variations for spacing), but blocks the common ones. It does have a higher than you would like incidence of false positives. To this right you could use a commercial API or write some machine learning.
Warning! src/main/resources and src/test/resources in github contain a list of offensive words.
Building a Hive UDF
To Build an Eclipse Project mvn eclipse:eclipse
To Build ./build.sh
To Build for Command-Line Usage (outside of Hive) ./buildfirst.sh
(or) mvn clean compile assembly:single
generates
target/deprofaner-1.0-jar-with-dependencies.jar
Copy deprofaner*jar to directory to run from or /usr/hdp/current/hive-client/lib/ mkdir -p /opt/demo/udf
Copy src/main/resources/terms.txt to /opt/demo/udf/terms.txt
In Hive hive> set hive.cli.print.header=true;
hive> add jar deprofaner-1.0-jar-with-dependencies.jar;
Added [deprofaner-1.0-jar-with-dependencies.jar] to class path
Added resources: [deprofaner-1.0-jar-with-dependencies.jar]
hive> CREATE TEMPORARY FUNCTION cleaner as 'com.dataflowdeveloper.deprofaner.ProfanityRemover';
OK
select cleaner('clean this <curseword> up now') from sample_07 limit 1;
OK
_c0
clean this **** up now
Time taken: 6.279 seconds, Fetched: 1 row(s)
Check logs in /var/log/hive/hiveserver2.log I set the Hive CLI Print Header for more details on output. To make this a Permanent UDF
Run scripts/install.sh, which creates an HDFS directory with open permissions and puts our built JAR up there. set hive.cli.print.header=true;
CREATE FUNCTION cleaner as 'com.dataflowdeveloper.deprofaner.ProfanityRemover' USING JAR 'hdfs:///udf/deprofaner-1.0-jar-with-dependencies.jar'; This is a working example of a Hive UDF. The primary code is pretty short: @Description(name = "profanityremover", value = "_FUNC_(string) - sanitizes text by replacing profanities ")
public final class ProfanityRemover extends UDF {
/**
* UDF Evaluation
*
* @param s
* Text passed in
* @return Text cleaned
*/
public Text evaluate(final Text s) {
if (s == null) {
return null;
}
String cleaned = Util.filterOutProfanity(s.toString());
return new Text(cleaned);
}
}
There's not much to writing a simple UDF (that is extending the UDF class), there are some other classes to extend for more functionality. But for writing a basic function this works really well. You just need to implement one method: evaluate. Then you build a Jar. See the build.sh and pom.xml for Maven build details. Deploy the Jar. hive> add jar deprofaner-1.0-jar-with-dependencies.jar; Create the function. hive> CREATE TEMPORARY FUNCTION cleaner as 'com.dataflowdeveloper.deprofaner.ProfanityRemover'; Use it like any other function. Pretty cool.
... View more
Labels:
06-15-2016
07:08 PM
Does anyone have any good Hive UDF templates? Good starters for writing Hive UDFs? Also is there a UI for building them like an Eclipse plug in or IntelliJ IDEA plugin? I am looking for something like start.spring.io or JHipster for big data.
... View more
Labels:
- Labels:
-
Apache Hive
06-14-2016
03:14 PM
Hive UDFs Similar to
Oracle Funcitons https://github.com/nexr/hive-udf/wiki
... View more
06-14-2016
02:26 PM
can you post the SQL and table definition? You may have incorrect properties as Parquet is not the same as a CSV file. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC
... View more
06-14-2016
01:48 PM
1 Kudo
There are a lot of Hive UDF Libraries https://github.com/nexr/hive-udf/wiki https://github.com/sharethrough/hive-udfs https://github.com/yahoo/hive-funnel-udf https://github.com/livingsocial/HiveSwarm Hive UDF with Yahoo Data sketches http://datasketches.github.io/docs/Theta/ThetaHiveUDFs.html
... View more
06-13-2016
03:28 PM
It's in progress and hopefully will be in the next release https://community.emc.com/message/925027#925027
... View more
06-13-2016
01:42 PM
1 Kudo
I am curious if it produced useful information. https://github.com/linkedin/dr-elephant/wiki/User-Guide
... View more
Labels:
- Labels:
-
Apache Hadoop