Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Although it is a simple solution, there is not much reference to this exact problem in the www.

We have seen people doing multiple projection, flattening and group by to get the data into shape for storing into the final table using pig after the aggregation and group operations.

Source Table:

CREATE EXTERNAL TABLE IF NOT EXISTS source(
pk1  string,
pk2  string,
agg1 INT,
agg2 INT
)
STORED AS ORC tblproperties("orc.compress"="ZLIB");

Data:

8625-data.jpg

Target Table:

CREATE EXTERNAL TABLE IF NOT EXISTS target_aggregated(
pk1 string,
pk2 string,
sum_agg1 BIGINT,
sum_agg2 BIGINT
)
STORED AS ORC tblproperties("orc.compress"="ZLIB");

Pig Script:

--Load the data in pig relations
staging = LOAD 'DEFAULT.SOURCE' USING org.apache.hive.hcatalog.pig.HCatLoader();
--Group the data
group_staging = group staging BY (pk1,pk2);
--Flatten the grouped data and generate aggregates with same attribute names as the target table
calculate_group_staging = FOREACH group_staging GENERATE FLATTEN(group) AS(pk1,pk2),SUM(staging.agg1) as sum_agg1, SUM(staging.agg2) as sum_agg2;
--Order the data if required
calculate_group_staging_ordered = ORDER calculate_group_staging BY pk1,pk2;
--Store the data using HCatStorer
--Data will be automatically dereferenced by using the HCatalog metastore
STORE calculate_group_staging_ordered INTO 'DEFAULT.TARGET_AGGREGATED' USING org.apache.hive.hcatalog.pig.HCatStorer();

Final Results:

8628-fr.png

Hope this helps.

Thanks.

811 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 08:46 AM
Updated by:
 
Contributors
Top Kudoed Authors