Code Repositories

Find and share code repositories
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.
Labels (1)
Super Guru
Repo Description

Rueedliner Hive UDF for Data Mining

  • distance - which calculates the distance between to strings based on selected algorithm (e.g Levenstein, Jaro Winkler, NGramDistance, etc.).
  • suggestion - based on a text based dictionary.
  • clean - clean text from whitspaces and other characters.
  • urlextractor - extract first url match from text
  • classifier - classify text based on a trainings set (naive bayes classifier)
Repo Info
Github Repo URL https://github.com/rueedlinger/hive-udf
Github account name rueedlinger
Repo name hive-udf
572 Views
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎12-23-2016 03:38 PM
Updated by:
Contributors