Created 05-02-2017 07:24 AM
For submit Hive, Pig job in Azure HDInsight, I am using .Net SDK. After successfully output of Pig Job ,I used Pig output file to load into Hive table. Hive job executed successfully but Pig output is in BAG format , so its store data in Hive table like "(" , ")" characters. Means First and Last column of Hive table having these characters. I want to remove this characters from Hive table. Should I have to change the Pig output format or Is this anyway to remove above character from Hive Table ?
Please give solution.
Thank You.
Created 05-02-2017 08:13 AM
You can use flatten operator to remove bag thus removing the extra characters http://pig.apache.org/docs/r0.16.0/basic.html#flatten so before you finish generating the file with Pig, call the flatten operator and then load it in aHive table
grunt> cat empty.bag {} 1 grunt> A = LOAD 'empty.bag' AS (b : bag{}, i : int); grunt> B = FOREACH A GENERATE flatten(b), i; grunt> DUMP B; grunt>
Created 05-02-2017 08:13 AM
You can use flatten operator to remove bag thus removing the extra characters http://pig.apache.org/docs/r0.16.0/basic.html#flatten so before you finish generating the file with Pig, call the flatten operator and then load it in aHive table
grunt> cat empty.bag {} 1 grunt> A = LOAD 'empty.bag' AS (b : bag{}, i : int); grunt> B = FOREACH A GENERATE flatten(b), i; grunt> DUMP B; grunt>
Created 05-02-2017 08:05 PM
As stated by Ervits after flattening the pig output to remove bags if there are any use the file to load into a hive table.
For loading into a hive table use " load data inpath <pig out output file> into table <hive table name>".
Ensure that the format of the pig output file is compatible with hive native formats. If you do so then you will achieve what you are looking for.