Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Recommended data quality test suite for Hive / Pig / Oozie

Solved Go to solution
Highlighted

Recommended data quality test suite for Hive / Pig / Oozie

Contributor

Hi

I am looking for Best practices around data quality Testing for hive / pig/ oozie based ETL.

Client is looking at tools like Data flux data quality for Hadoop .

If there are any alternate recommendations , please update this question.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Recommended data quality test suite for Hive / Pig / Oozie

5 REPLIES 5

Re: Recommended data quality test suite for Hive / Pig / Oozie

Re: Recommended data quality test suite for Hive / Pig / Oozie

Contributor

Hi Neeraj - Trifacta seems to be a data wrangling tool, does it also provide data quality measures OOTB ?

Re: Recommended data quality test suite for Hive / Pig / Oozie

@pbalasundaram I have heard that it can be used for the quality while wrangling.

https://www.trifacta.com/wp-content/uploads/2014/01/Trifacta_DataTransformValue_WP.pdf

Re: Recommended data quality test suite for Hive / Pig / Oozie

Mentor

@pbalasundaram are you still having issues with this? Can you accept best answer or provide your own solution?

Re: Recommended data quality test suite for Hive / Pig / Oozie

New Contributor

Hi Neeraj, for data quality testing is there a model script developed on pig or spark, rather than using a tool. Thanks.