Support Questions

Find answers, ask questions, and share your expertise

Data quality analysis

Explorer

Hi,

Could you share the details on analysing the data quality that is loaded in Hive.

I have got a text file around 250 million records which I have loaded into hive and stored in parquet file. Now my next task is to analyse the quality of data. Since I am not from ETL background, this is new to me. Could you share some details that could be used on Hive tables. I would prefer spark or pig.

Thanks in adavnce!!!

1 ACCEPTED SOLUTION

These are some tools to help you cleanse the data and give you insight of the data.

https://www.talend.com/resource/data-quality-tools.html

https://www.trifacta.com/

alternatively you can write some custom script to know the qualitative analysis.

View solution in original post

2 REPLIES 2

These are some tools to help you cleanse the data and give you insight of the data.

https://www.talend.com/resource/data-quality-tools.html

https://www.trifacta.com/

alternatively you can write some custom script to know the qualitative analysis.

Explorer

Thank you. Do you know any generic scripts developed in spark for data profiling and data cleaning, that you can share?