Support Questions

Find answers, ask questions, and share your expertise

Data quality analysis

avatar
Rising Star

Hi,

Could you share the details on analysing the data quality that is loaded in Hive.

I have got a text file around 250 million records which I have loaded into hive and stored in parquet file. Now my next task is to analyse the quality of data. Since I am not from ETL background, this is new to me. Could you share some details that could be used on Hive tables. I would prefer spark or pig.

Thanks in adavnce!!!

1 ACCEPTED SOLUTION

avatar
Super Guru

These are some tools to help you cleanse the data and give you insight of the data.

https://www.talend.com/resource/data-quality-tools.html

https://www.trifacta.com/

alternatively you can write some custom script to know the qualitative analysis.

View solution in original post

2 REPLIES 2

avatar
Super Guru

These are some tools to help you cleanse the data and give you insight of the data.

https://www.talend.com/resource/data-quality-tools.html

https://www.trifacta.com/

alternatively you can write some custom script to know the qualitative analysis.

avatar
Rising Star

Thank you. Do you know any generic scripts developed in spark for data profiling and data cleaning, that you can share?