I have a number of zipped files that I moved into HDFS, the content , unfortunately is not csv formatted.
Now, I want to create tables inside Hive,and I want to move my date there.
The format of my data id written for each subscriber into 2 parts ( with different size) with "field=value" format.
Each row is delimited by ";" like below:
Read:Part1 (CustomerId="50000508", Key=1, StartDateDate=2017-01-01T22:59:59Z, EndDate=2030-01-08T22:59:59Z, IsActive=true);
Read:Part1 (CustomerId="50000508", Balance=1200, Status=Valid);
Read:Part1(CustomerId="50000506", Key=1, StartDateDate=2016-08-01T22:59:59Z, EndDate=2027-01-08T22:59:59Z, IsActive=true);
Read:Part2 (CustomerId="50000506", Balance=1850, Status=Valid);
My problem now, is how to extract only the needed value from each field to insert it into my Hive table.
Is there a way to do this using hive ?
My target , is at the end , to create 2 tables , table for the part 1 and table for part 2 , and select only some fields from each row. For example, only customerId and IsActive from Part1 and CustomerId and Status from part 2.
Could this be done maybe in one shot since the beginning or may be in 2 steps. ( creating the table with all fields and then moving only needed columns intoa second table).
I'm open also for other ideas (maybe not only Hive)
My target is to move this data into tables before executing SQL and SQL like commands.
Using pig you can transform the data to your desired state and then push the same output file into hive.
either you load result into hcatalog or hdfs directory or hive warehouse. i dont think you can perform insert directly to hive table from pig