Support Questions

Find answers, ask questions, and share your expertise

Question PIG and HIVE Code

avatar
New Contributor

Hi, i need to know how to create a code in both PIG and Hive for this data: 

FrankUAM_0-1666022726138.png

I need to obtain the sum of the newcases for the continents, as well as which country has had the highest number of cases in each continent and their number. somethig like:

FrankUAM_2-1666023196310.png

 

For pig i was trying somethig like: 

FrankUAM_1-1666022911623.png

Any one knows how to do it?

 

Thanks.

1 ACCEPTED SOLUTION

avatar
Master Collaborator

You can use the following HQL (SQL) query to find the sum of new cases for each continent and identify the country with the highest number of cases in each continent along with the corresponding case count:

 

 

WITH ContinentSum AS (
  SELECT
    continent,
    SUM(new_cases) AS total_new_cases
  FROM
    sample_table_test
  GROUP BY
    continent
),

CountryMaxCases AS (
  SELECT
    continent,
    location AS country,
    MAX(total_case) AS max_cases
  FROM
    sample_table_test
  GROUP BY
    continent, location
)

SELECT
  cs.continent,
  cs.total_new_cases,
  cm.country,
  cm.max_cases
FROM
  ContinentSum cs
JOIN
  CountryMaxCases cm
ON
  cs.continent = cm.continent
  AND cs.total_new_cases = cm.max_cases;

 

 

This query first calculates the sum of new cases for each continent in the ContinentSum CTE (Common Table Expression). Then, it finds the country with the highest total cases in each continent using the CountryMaxCases CTE. Finally, it joins the results from both CTEs to provide the desired output.


Sample resultset for the shared data.

 

 

+---------------+---------------------+--------------+---------------+
| cs.continent  | cs.total_new_cases  |  cm.country  | cm.max_cases  |
+---------------+---------------------+--------------+---------------+
| Asia          | 25.0                | Afghanistan  | 25.0          |
+---------------+---------------------+--------------+---------------+

 

 



View solution in original post

1 REPLY 1

avatar
Master Collaborator

You can use the following HQL (SQL) query to find the sum of new cases for each continent and identify the country with the highest number of cases in each continent along with the corresponding case count:

 

 

WITH ContinentSum AS (
  SELECT
    continent,
    SUM(new_cases) AS total_new_cases
  FROM
    sample_table_test
  GROUP BY
    continent
),

CountryMaxCases AS (
  SELECT
    continent,
    location AS country,
    MAX(total_case) AS max_cases
  FROM
    sample_table_test
  GROUP BY
    continent, location
)

SELECT
  cs.continent,
  cs.total_new_cases,
  cm.country,
  cm.max_cases
FROM
  ContinentSum cs
JOIN
  CountryMaxCases cm
ON
  cs.continent = cm.continent
  AND cs.total_new_cases = cm.max_cases;

 

 

This query first calculates the sum of new cases for each continent in the ContinentSum CTE (Common Table Expression). Then, it finds the country with the highest total cases in each continent using the CountryMaxCases CTE. Finally, it joins the results from both CTEs to provide the desired output.


Sample resultset for the shared data.

 

 

+---------------+---------------------+--------------+---------------+
| cs.continent  | cs.total_new_cases  |  cm.country  | cm.max_cases  |
+---------------+---------------------+--------------+---------------+
| Asia          | 25.0                | Afghanistan  | 25.0          |
+---------------+---------------------+--------------+---------------+