Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Suggestions for a record linkage process

Highlighted

Suggestions for a record linkage process

New Contributor

Hello,

I have 30+ data sources with unique schemas of "Person" data (meaning a mix of SSN, DOB, First and Last Name, etc). My goal is to use existing technologies or libraries to compare these records for the purpose of record linkage (preferably open-source), and consolidate the data according to master data management. I am currently thinking about ingesting all of the data into Nifi then normalizing it, and using either a Map Reduce algorithm to compare, or an existing technology. Does anyone have any suggestions for existing technologies that might help accomplish this? All input is appreciated!