Code Repositories
Find and share code repositories
All community
This category
Community Articles
Users
cancel
Turn on suggestions
Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.
Showing results for
Show
only
|
Search instead for
Did you mean:
Advanced Search
Cloudera Community
:
Support
:
Code Repositories
:
Rueedliner Hive UDF for Data Mining
Announcements
Check out our newest addition to the community, the
Cloudera Data Analytics (CDA) group hub.
Options
Subscribe to RSS Feed
Mark as New
Mark as Read
Bookmark
Subscribe
Printer Friendly Page
Report Inappropriate Content
Options
Subscribe to RSS Feed
Mark as New
Mark as Read
Bookmark
Subscribe
Printer Friendly Page
Report Inappropriate Content
Rueedliner Hive UDF for Data Mining
Labels
(1)
Labels:
Apache Hive
TimothySpann
Super Guru
Created on
12-23-2016
03:38 PM
Repo Description
Rueedliner Hive UDF for Data Mining
distance - which calculates the distance between to strings based on selected algorithm (e.g Levenstein, Jaro Winkler, NGramDistance, etc.).
suggestion - based on a text based dictionary.
clean - clean text from whitspaces and other characters.
urlextractor - extract first url match from text
classifier - classify text based on a trainings set (naive bayes classifier)
Repo Info
Github Repo URL
https://github.com/rueedlinger/hive-udf
Github account name
rueedlinger
Repo name
hive-udf
572 Views
1
Kudo
Take a Tour of the Community
Community Browser
Cloudera Community
Groups
Cloudera Innovation Accelerator
Innovation Discussions
Innovation Blog
Cloudera Data Analytics (CDA)
Cloudera Data Analytics (CDA) Forum
Cloudera Data Analytics (CDA) Blogs
Cloudera Data Analytics (CDA) Articles
Announcements
Community Announcements
Product Announcements
Support Announcements
What's New @ Cloudera
Support
Support Questions
Code Repositories
Community Articles
Using the Community
Intros and Suggestions
Community Tips
Have a Cloudera Account?
Sign In
Don't have an account?
Register
Your experience may be limited.
Sign in
to explore more.
Announcements
What's New @ Cloudera
Cloudera Operational Database (COD) UI supports creating a smaller cluster using a predefined Data Lake template
What's New @ Cloudera
Cloudera Operational Database (COD) supports scaling up the clusters vertically
Product Announcements
CDP Public Cloud: April 2023 Release Summary
What's New @ Cloudera
Cloudera Machine Learning launches "Add Data" feature to simplify data ingestion
What's New @ Cloudera
Simplify Data Access with Custom Connection Support in CML
View More Announcements
Version history
Last update:
12-23-2016
03:38 PM
Updated by:
TimothySpann
Contributors
TimothySpann