Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Recommended data quality test suite for Hive / Pig / Oozie

avatar
Expert Contributor

Hi

I am looking for Best practices around data quality Testing for hive / pig/ oozie based ETL.

Client is looking at tools like Data flux data quality for Hadoop .

If there are any alternate recommendations , please update this question.

1 ACCEPTED SOLUTION
6 REPLIES 6

avatar
Master Mentor

avatar
Expert Contributor

Hi Neeraj - Trifacta seems to be a data wrangling tool, does it also provide data quality measures OOTB ?

avatar
Master Mentor

@pbalasundaram I have heard that it can be used for the quality while wrangling.

https://www.trifacta.com/wp-content/uploads/2014/01/Trifacta_DataTransformValue_WP.pdf

avatar
Master Mentor

@pbalasundaram are you still having issues with this? Can you accept best answer or provide your own solution?

avatar
Rising Star

Hi Neeraj, for data quality testing is there a model script developed on pig or spark, rather than using a tool. Thanks.

avatar
Explorer

2020 Update, what are the preferred data quality tools compatible with CDH for Hive,Hbase and Solr? Our team is looking at Apache Griffin. 

 

Regards, 
Nithya Koka