Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How to load a bag from a file

avatar

Hi,

I was trying to load a file in Pig which contains data like :

{(3),(mary),(19)}

{(1),(john),(18)}

{(2),(joe),(18)}

Following command is falling :

A = LOAD 'data3' AS (B: bag {T: tuple(t1:int), F:tuple(f1:chararray), G:tuple(g1:int)});

How to do it in correct way ?

Thanks,

Soumya

1 ACCEPTED SOLUTION

avatar
Master Guru

I don't think there is a Pig Storage handler that does that. Which is a bit weird I suppose. How did you generate that file? Just test data you did manually?

PigStorage essentially reads writes delimited files, tuples can be Maps/bags but I don't think the main record can be.

JsonStorage is Json format which is different syntax. Then there is BinStorage which I suppose is some kind of Sequence file.

I might just not see that but I think there is no way in Pig natively without some transformations to read data in the format he prints it on for debugging. Please someone correct me if I am wrong.

http://pig.apache.org/docs/r0.14.0/func.html#load-store-functions

View solution in original post

3 REPLIES 3

avatar
Master Mentor

avatar
Master Guru

I don't think there is a Pig Storage handler that does that. Which is a bit weird I suppose. How did you generate that file? Just test data you did manually?

PigStorage essentially reads writes delimited files, tuples can be Maps/bags but I don't think the main record can be.

JsonStorage is Json format which is different syntax. Then there is BinStorage which I suppose is some kind of Sequence file.

I might just not see that but I think there is no way in Pig natively without some transformations to read data in the format he prints it on for debugging. Please someone correct me if I am wrong.

http://pig.apache.org/docs/r0.14.0/func.html#load-store-functions

avatar
Master Mentor

Load the data using pig storage and then run tobag function http://pig.apache.org/docs/r0.15.0/func.html#tobag is it a comma separated file?

a = LOAD 'student' AS (f1:chararray, f2:int, f3:float);
DUMP a;

(John,18,4.0)
(Mary,19,3.8)
(Bill,20,3.9)
(Joe,18,3.8)

b = FOREACH a GENERATE TOBAG(f1,f3);
DUMP b;

({(John),(4.0)})
({(Mary),(3.8)})
({(Bill),(3.9)})
({(Joe),(3.8)})