Support Questions

Find answers, ask questions, and share your expertise

1) Need help on count of no of records in the pig batch processing 2) need to calculate sum of values of all the records in a batch in pig

Rising Star

we have records processing in a batch for each 30min interval. we need to calculate max(or)sum of all the values of all the records in a single iteration. could you help me in calculating this and also can you help me in having a wordcount of the no of records in each batch in pig

i need to do this in the pig

can you help me in sending commands with example

1 REPLY 1

Guru

The following will give you the total number of records in a file and the sum of a value of one field.

A = LOAD 'data.csv' USING PigStorage(',') AS (name:chararray, salary:int);
B = GROUP A ALL;
X = FOREACH B GENERATE COUNT(A.name), SUM(A.salary);
dump X;

MAX will give you maximum of all fields.

Note the dump X writes results to screen. If you want to persist the results you could write it to a file, but this is not a good practice in your case because you will be writing many small files (not best practice in hadoop). There are ways to get around this though. Alternatively you could insert results to a hive table with columns something like: date, filename, count, sum, max.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.