I have one hive table and I want to loop thru the table and store into multiple files based on FileID. As the example explained below:
FileID Cust_Name Cust_Age zip
1 Jack Johns 21 10008
1 William Down 51 10008
2 Shara Smith 31 12110
3 edward Miller 34 21222
N Mike Will 30 22212
based on the fileID, how to store the table data into N files? And the file name shall be the zip code because the data are grouped by zip.
Thanks for your help.
every Hive table has two virtual columns called "INPUT__FILE__NAME" and "BLOCK__OFFSET__INSIDE__FILE". I think you need to use the second one or may be a combination of these virtual columns to create new files. First use in select clause to find byte off set of each field id and then use this in where clause when doing a "insert into select from".
Thanks for your suggestion. But these two will not be able to help because my data need exporting into different files based on the fileID field. those two virtual columns have no relation with FileID field, which means the data have the same fileID but their blockID inside the file are different. What's more of my requirement is, the number of fileID is not static. So I need some loop logic or dynamic approach to handle it.