04-11-2018 06:02 PM - last edited on 04-12-2018 07:11 AM by cjervis
I have a number of zipped files that I moved into HDFS, the content , unfortunately is not csv formatted.
Now, I want to create tables inside Hive,and I want to move my date there.
The format of my data id written for each subscriber into 2 parts ( with different size) with "field=value" format.
Each row is delimited by ";" like below:
Read:Part1 (CustomerId="50000508", Key=1, StartDateDate=2017-01-01T22:59:59Z, EndDate=2030-01-08T22:59:59Z, IsActive=true);
Read:Part1 (CustomerId="50000508", Balance=1200, Status=Valid);
Read:Part1(CustomerId="50000506", Key=1, StartDateDate=2016-08-01T22:59:59Z, EndDate=2027-01-08T22:59:59Z, IsActive=true);
Read:Part2 (CustomerId="50000506", Balance=1850, Status=Valid);
My problem now, is how to extract only the needed value from each field to insert it into my Hive table.
Is there a way to do this using hive ?
My target , is at the end , to create 2 tables , table for the part 1 and table for part 2 , and select only some fields from each row. For example, only customerId and IsActive from Part1 and CustomerId and Status from part 2.
Could this be done maybe in one shot since the beginning or may be in 2 steps. ( creating the table with all fields and then moving only needed columns intoa second table).
04-17-2018 07:32 PM - edited 04-17-2018 07:37 PM
Using pig you can transform the data to your desired state and then push the same output file into hive.
either you load result into hcatalog or hdfs directory or hive warehouse. i dont think you can perform insert directly to hive table from pig