Created on 12-07-2015 06:38 PM - edited 09-16-2022 02:51 AM
Hi
I am looking for Best practices around data quality Testing for hive / pig/ oozie based ETL.
Client is looking at tools like Data flux data quality for Hadoop .
If there are any alternate recommendations , please update this question.
Created 12-07-2015 08:36 PM
https://www.trifacta.com/ - It can be used for data quality checks during cleansing
http://vis.stanford.edu/wrangler/ - open source version of trifacta
https://www.talend.com/resource/data-quality-tools.html
https://www.ataccama.com/products/big-data-platform-for-hadoop/big-data-engine
Created 12-07-2015 08:36 PM
https://www.trifacta.com/ - It can be used for data quality checks during cleansing
http://vis.stanford.edu/wrangler/ - open source version of trifacta
https://www.talend.com/resource/data-quality-tools.html
https://www.ataccama.com/products/big-data-platform-for-hadoop/big-data-engine
Created 12-07-2015 08:49 PM
Hi Neeraj - Trifacta seems to be a data wrangling tool, does it also provide data quality measures OOTB ?
Created 12-07-2015 08:58 PM
@pbalasundaram I have heard that it can be used for the quality while wrangling.
https://www.trifacta.com/wp-content/uploads/2014/01/Trifacta_DataTransformValue_WP.pdf
Created 02-03-2016 03:48 PM
@pbalasundaram are you still having issues with this? Can you accept best answer or provide your own solution?
Created 05-25-2016 04:41 AM
Hi Neeraj, for data quality testing is there a model script developed on pig or spark, rather than using a tool. Thanks.
Created 12-07-2020 11:36 AM
2020 Update, what are the preferred data quality tools compatible with CDH for Hive,Hbase and Solr? Our team is looking at Apache Griffin.
Regards,
Nithya Koka