Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Group without schema

Explorer

Hi all,

I have a text file that look like (the is no header...):

Year\tName\tSalary

2015 Marc 100 2016 Marc 200 2017 Marc 300 2015 Lucy 100 2016 Lucy 200 2017 Lucy 300 2015 John 100 2016 John 200 2017 John 300

and I wanto to calculate avg salary for each employee.

By executing the following code:

a = load '/user/horton/salary' as ( year:int, name:chararray, salary:int ); b = group a by name; d = FOREACH b GENERATE group as name, AVG( a.salary ) as avgsalary; describe d; d { name: chararray, avgsalary: double ) } } dump d;

I obtained the result as aspected:

(Marc, 200 )

(Lucy, 200 )

(John, 200 )

But, when I tried the following code:

a = load '/user/horton/salary'; b = FOREACH a GENERATE $0 as year:int, $1 as name:chararray, $2 as salary:int; b { year: int, name: chararray, salary: int } c = group b by name; c { group: chararray, b { ( year: int, name: chararray, salary: int ) } } d = FOREACH c GENERATE group as name, AVG( b.salary ) as avgsalary; describe d; d { name: chararray, avgsalary: double ) } } dump d;

I have got an error:

Error 0 Exception while executing (Name: c: Local Rearrange[touple]{chararry}(false) - scope 33 Operator key: scope-33) org.apache.pig.beckend.executionengine.ExecException: ERROR while computing average Initial

Why?

Can anyone help me?

In general what is the approach whenever I have a file with a lot of fields and I cannot explicitly declare all the fields name in the LOAD phase?

Thanks.

Mauro

5 REPLIES 5

This "looked" right when glanced at, so I ran your initial script fine like you did and then started one line at a time on the second script. I ran into the following error on the FOREACH / GENERATE line.

2017-04-06 11:29:05,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias c. Backend error : java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.String 

So... I just explicitly casted everything as you'll see in my updated script.

a = load '/user/maria_dev/hcc/92577/salary'; 

b = FOREACH a GENERATE (int) $0 as year:int, (chararray) $1 as name:chararray, (int) $2 as salary:int;
describe b;
dump b;

c = group b by name;
describe c;
dump c;

d = FOREACH c GENERATE group as name, AVG( b.salary ) as avgsalary; 
describe d; 
dump d;

Here are the (expected) results.

b: {year: int,name: chararray,salary: int}
(2015,Marc,100)
(2016,Marc,200)
(2017,Marc,300)
(2015,Lucy,100)
(2016,Lucy,200)
(2017,Lucy,300)
(2015,John,100)
(2016,John,200)
(2017,John,300)
c: {group: chararray,b: {(year: int,name: chararray,salary: int)}}
(John,{(2017,John,300),(2016,John,200),(2015,John,100)})
(Lucy,{(2017,Lucy,300),(2016,Lucy,200),(2015,Lucy,100)})
(Marc,{(2017,Marc,300),(2016,Marc,200),(2015,Marc,100)})
d: {name: chararray,avgsalary: double}
(John,200.0)
(Lucy,200.0)
(Marc,200.0)

Good luck and happy Hadooping!!

Explorer
Hi Martin, thanks so much, I also tried with: a = foreach b generate (int)$0 as f1.; and this piece of code function.. Any idea about the difference between: a = foreach b generate (int)$0 as f1; and a = foreach b generate (int)$0 as f1:int; ? thanks. Regard, Mauro bye.

The explicit cast (i.e. the "(int)" bit) just casts whatever datatype you initially have (bytearray in this case) to something else. The ":int" formally declares that the new field your generating needs to be that datatype. As you noticed, both will work and, in fact, my double-efforts are almost overkill in the example above, but it is something I do pretty consistently. This would be more appropriate if you were doing some kind of math or function call where you were casting something and pushing it against something of another data type and just wanted to 100% be sure of what datatype you were jamming the resulting value into.

Glad to know you are off and running again. If you think it deserves it, I hope you can "accept" my answer above so it'll get annotated with "Best Answer". Again, good luck and happy Hadooping!!

Thanks @Lester Martin. Saved my time!!.

Glad it did. Please check out https://martin.atlassian.net/wiki/x/AunyBQ and if you think I deserve it, kindly click the "Accept" link on my original answer. Thanks!!