- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Apache PIG - If Statement based on a count value
- Labels:
-
Apache Hadoop
-
Apache Pig
Created ‎09-04-2016 02:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi experts, I've this statment in Apache PIG: ... Count = FOREACH data GENERATE SUM(Field); ... How can do a IF Statement like this: IF(SUM(Field) > 10)
Store into X; ELSE STORE into Y; Is possible to do this? Many thanks!
Created ‎09-04-2016 07:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This requirement is based around FILTER, which retrieves records that satisfy one or more conditions.
There are two ways to do this.
This first is using FILTER as below:
X = FILTER Count by Field >10; Y = FILTER Count by Field <=10;
The second way achieves the same result but using different grammar.
SPLIT Count into X if Field >10, Y if Field <=10;
Please note that the use of SUM requires a GROUP operation beforehand. In your case, you would have needed to GROUP data before you summed it as shown in your first line of code.
It would have to look something like the following.
data = LOAD ... as (amt:int, name:chararray); grouped_data = GROUP data by name; summed_data = FOREACH grouped_data GENERATE SUM(data.amt) amtSum, name; X = FILTER summed_data by amtSum >10; Y = FILTER summed_data by amtSum <=10;
See:
- https://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#SUM
- http://www.thomashenson.com/sum-field-apache-pig/
(Let me know if this is what you are looking for by accepting the answer).
Created ‎09-04-2016 07:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This requirement is based around FILTER, which retrieves records that satisfy one or more conditions.
There are two ways to do this.
This first is using FILTER as below:
X = FILTER Count by Field >10; Y = FILTER Count by Field <=10;
The second way achieves the same result but using different grammar.
SPLIT Count into X if Field >10, Y if Field <=10;
Please note that the use of SUM requires a GROUP operation beforehand. In your case, you would have needed to GROUP data before you summed it as shown in your first line of code.
It would have to look something like the following.
data = LOAD ... as (amt:int, name:chararray); grouped_data = GROUP data by name; summed_data = FOREACH grouped_data GENERATE SUM(data.amt) amtSum, name; X = FILTER summed_data by amtSum >10; Y = FILTER summed_data by amtSum <=10;
See:
- https://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#SUM
- http://www.thomashenson.com/sum-field-apache-pig/
(Let me know if this is what you are looking for by accepting the answer).
