Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

Recommended data quality test suite for Hive / Pig / Oozie

avatar
Contributor

Hi

I am looking for Best practices around data quality Testing for hive / pig/ oozie based ETL.

Client is looking at tools like Data flux data quality for Hadoop .

If there are any alternate recommendations , please update this question.

1 ACCEPTED SOLUTION
6 REPLIES 6

avatar

avatar
Contributor

Hi Neeraj - Trifacta seems to be a data wrangling tool, does it also provide data quality measures OOTB ?

avatar

@pbalasundaram I have heard that it can be used for the quality while wrangling.

https://www.trifacta.com/wp-content/uploads/2014/01/Trifacta_DataTransformValue_WP.pdf

avatar
Mentor

@pbalasundaram are you still having issues with this? Can you accept best answer or provide your own solution?

avatar
Explorer

Hi Neeraj, for data quality testing is there a model script developed on pig or spark, rather than using a tool. Thanks.

avatar
Explorer

2020 Update, what are the preferred data quality tools compatible with CDH for Hive,Hbase and Solr? Our team is looking at Apache Griffin. 

 

Regards, 
Nithya Koka

Labels