I am looking for Best practices around data quality Testing for hive / pig/ oozie based ETL.
Client is looking at tools like Data flux data quality for Hadoop .
If there are any alternate recommendations , please update this question.
https://www.trifacta.com/ - It can be used for data quality checks during cleansing
http://vis.stanford.edu/wrangler/ - open source version of trifacta
View solution in original post
Hi Neeraj - Trifacta seems to be a data wrangling tool, does it also provide data quality measures OOTB ?
@pbalasundaram I have heard that it can be used for the quality while wrangling.
@pbalasundaram are you still having issues with this? Can you accept best answer or provide your own solution?
Hi Neeraj, for data quality testing is there a model script developed on pig or spark, rather than using a tool. Thanks.
2020 Update, what are the preferred data quality tools compatible with CDH for Hive,Hbase and Solr? Our team is looking at Apache Griffin.
Regards, Nithya Koka