- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Data quality analysis
- Labels:
-
Apache Pig
-
Apache Spark
Created ‎05-25-2016 04:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you share the details on analysing the data quality that is loaded in Hive.
I have got a text file around 250 million records which I have loaded into hive and stored in parquet file. Now my next task is to analyse the quality of data. Since I am not from ETL background, this is new to me. Could you share some details that could be used on Hive tables. I would prefer spark or pig.
Thanks in adavnce!!!
Created ‎05-25-2016 05:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
These are some tools to help you cleanse the data and give you insight of the data.
https://www.talend.com/resource/data-quality-tools.html
alternatively you can write some custom script to know the qualitative analysis.
Created ‎05-25-2016 05:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
These are some tools to help you cleanse the data and give you insight of the data.
https://www.talend.com/resource/data-quality-tools.html
alternatively you can write some custom script to know the qualitative analysis.
Created ‎05-25-2016 05:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. Do you know any generic scripts developed in spark for data profiling and data cleaning, that you can share?
