Support Questions
Find answers, ask questions, and share your expertise

Best opensource tools to achieve data governance, lineage,catalog and privacy for big data?

Best opensource tools to achieve data governance, lineage,catalog and privacy for big data?

New Contributor

Our solution consist of Nifi -> kafka -> spark as the ingestion, storage and processing paltforms. The final computed XML data will be delivered to PostgreSQL. We have a requirement to make this solution as a central data repository hub by providing data governance,lineage,data discovery and data security, privacy features to end users so they will be able to add their own data sources, orchestrations etc, what are the best opensource tools available to meet these requirements? 

 

Based on my findings, Dremio is better for governance, security, workload management, multi-tenancy, apache atlas for metadata management, apache ranger for data access security