Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Suggestions for a record linkage process


Suggestions for a record linkage process


I have 30+ data sources with unique schemas of "Person" data (meaning a mix of SSN, DOB, First and Last Name, etc). My goal is to use existing technologies or libraries to compare these records for the purpose of record linkage (preferably open-source), and consolidate the data according to master data management. I am currently thinking about ingesting all of the data into Nifi then normalizing it, and using either a Map Reduce algorithm to compare, or an existing technology. Does anyone have any suggestions for existing technologies that might help accomplish this? All input is appreciated!

Don't have an account?
Coming from Hortonworks? Activate your account here