Created 08-08-2016 03:37 PM
Hi, I have four tables in .csv. All of them can be conected through a fact table (that are in .csv too). I wanna to do some data cleansing to this files and next put them into a Big Table in Have. But in Apache PIG should I've to create a script by table individually, or is better to join in PIG and then aplly some data cleansing in this normalized table? Thanks!
Created 08-08-2016 07:20 PM
Hi @João Souza
Personally, I'd create a script by each individual table. This way I can focus on the one table (if something changes) rather than modifying a larger script that encompasses all the tables (which would of course be more coding - creating a steeper learning curve for another developer).
Created 08-08-2016 07:20 PM
Hi @João Souza
Personally, I'd create a script by each individual table. This way I can focus on the one table (if something changes) rather than modifying a larger script that encompasses all the tables (which would of course be more coding - creating a steeper learning curve for another developer).