Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Apache PIG - If Statement based on a count value

Solved Go to solution

Apache PIG - If Statement based on a count value

Explorer

Hi experts, I've this statment in Apache PIG: ... Count = FOREACH data GENERATE SUM(Field); ... How can do a IF Statement like this: IF(SUM(Field) > 10)

Store into X; ELSE STORE into Y; Is possible to do this? Many thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Apache PIG - If Statement based on a count value

Guru

@João Souza

This requirement is based around FILTER, which retrieves records that satisfy one or more conditions.

There are two ways to do this.

This first is using FILTER as below:

X = FILTER Count by Field >10; 
Y = FILTER Count by Field <=10; 

The second way achieves the same result but using different grammar.

SPLIT Count into X if Field >10, Y if Field <=10;

Please note that the use of SUM requires a GROUP operation beforehand. In your case, you would have needed to GROUP data before you summed it as shown in your first line of code.

It would have to look something like the following.

data = LOAD ... as (amt:int, name:chararray);
grouped_data = GROUP data by name;
summed_data = FOREACH grouped_data GENERATE SUM(data.amt) amtSum, name; 
X = FILTER summed_data by amtSum >10; 
Y = FILTER summed_data by amtSum <=10; 

See:

(Let me know if this is what you are looking for by accepting the answer).

View solution in original post

1 REPLY 1
Highlighted

Re: Apache PIG - If Statement based on a count value

Guru

@João Souza

This requirement is based around FILTER, which retrieves records that satisfy one or more conditions.

There are two ways to do this.

This first is using FILTER as below:

X = FILTER Count by Field >10; 
Y = FILTER Count by Field <=10; 

The second way achieves the same result but using different grammar.

SPLIT Count into X if Field >10, Y if Field <=10;

Please note that the use of SUM requires a GROUP operation beforehand. In your case, you would have needed to GROUP data before you summed it as shown in your first line of code.

It would have to look something like the following.

data = LOAD ... as (amt:int, name:chararray);
grouped_data = GROUP data by name;
summed_data = FOREACH grouped_data GENERATE SUM(data.amt) amtSum, name; 
X = FILTER summed_data by amtSum >10; 
Y = FILTER summed_data by amtSum <=10; 

See:

(Let me know if this is what you are looking for by accepting the answer).

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here