Support Questions

Find answers, ask questions, and share your expertise

How to load a bag from a file

avatar
Contributor

Hi,

I was trying to load a file in Pig which contains data like :

{(3),(mary),(19)}

{(1),(john),(18)}

{(2),(joe),(18)}

Following command is falling :

A = LOAD 'data3' AS (B: bag {T: tuple(t1:int), F:tuple(f1:chararray), G:tuple(g1:int)});

How to do it in correct way ?

Thanks,

Soumya

1 ACCEPTED SOLUTION

avatar
Master Guru

I don't think there is a Pig Storage handler that does that. Which is a bit weird I suppose. How did you generate that file? Just test data you did manually?

PigStorage essentially reads writes delimited files, tuples can be Maps/bags but I don't think the main record can be.

JsonStorage is Json format which is different syntax. Then there is BinStorage which I suppose is some kind of Sequence file.

I might just not see that but I think there is no way in Pig natively without some transformations to read data in the format he prints it on for debugging. Please someone correct me if I am wrong.

http://pig.apache.org/docs/r0.14.0/func.html#load-store-functions

View solution in original post

3 REPLIES 3

avatar
Master Mentor

avatar
Master Guru

I don't think there is a Pig Storage handler that does that. Which is a bit weird I suppose. How did you generate that file? Just test data you did manually?

PigStorage essentially reads writes delimited files, tuples can be Maps/bags but I don't think the main record can be.

JsonStorage is Json format which is different syntax. Then there is BinStorage which I suppose is some kind of Sequence file.

I might just not see that but I think there is no way in Pig natively without some transformations to read data in the format he prints it on for debugging. Please someone correct me if I am wrong.

http://pig.apache.org/docs/r0.14.0/func.html#load-store-functions

avatar
Master Mentor

Load the data using pig storage and then run tobag function http://pig.apache.org/docs/r0.15.0/func.html#tobag is it a comma separated file?

a = LOAD 'student' AS (f1:chararray, f2:int, f3:float);
DUMP a;

(John,18,4.0)
(Mary,19,3.8)
(Bill,20,3.9)
(Joe,18,3.8)

b = FOREACH a GENERATE TOBAG(f1,f3);
DUMP b;

({(John),(4.0)})
({(Mary),(3.8)})
({(Bill),(3.9)})
({(Joe),(3.8)})