Created 10-17-2022 09:14 AM
Hi, i need to know how to create a code in both PIG and Hive for this data:
I need to obtain the sum of the newcases for the continents, as well as which country has had the highest number of cases in each continent and their number. somethig like:
For pig i was trying somethig like:
Any one knows how to do it?
Thanks.
Created on 10-03-2023 01:58 AM - edited 10-03-2023 01:58 AM
You can use the following HQL (SQL) query to find the sum of new cases for each continent and identify the country with the highest number of cases in each continent along with the corresponding case count:
WITH ContinentSum AS (
SELECT
continent,
SUM(new_cases) AS total_new_cases
FROM
sample_table_test
GROUP BY
continent
),
CountryMaxCases AS (
SELECT
continent,
location AS country,
MAX(total_case) AS max_cases
FROM
sample_table_test
GROUP BY
continent, location
)
SELECT
cs.continent,
cs.total_new_cases,
cm.country,
cm.max_cases
FROM
ContinentSum cs
JOIN
CountryMaxCases cm
ON
cs.continent = cm.continent
AND cs.total_new_cases = cm.max_cases;
This query first calculates the sum of new cases for each continent in the ContinentSum CTE (Common Table Expression). Then, it finds the country with the highest total cases in each continent using the CountryMaxCases CTE. Finally, it joins the results from both CTEs to provide the desired output.
Sample resultset for the shared data.
+---------------+---------------------+--------------+---------------+
| cs.continent | cs.total_new_cases | cm.country | cm.max_cases |
+---------------+---------------------+--------------+---------------+
| Asia | 25.0 | Afghanistan | 25.0 |
+---------------+---------------------+--------------+---------------+
Created on 10-03-2023 01:58 AM - edited 10-03-2023 01:58 AM
You can use the following HQL (SQL) query to find the sum of new cases for each continent and identify the country with the highest number of cases in each continent along with the corresponding case count:
WITH ContinentSum AS (
SELECT
continent,
SUM(new_cases) AS total_new_cases
FROM
sample_table_test
GROUP BY
continent
),
CountryMaxCases AS (
SELECT
continent,
location AS country,
MAX(total_case) AS max_cases
FROM
sample_table_test
GROUP BY
continent, location
)
SELECT
cs.continent,
cs.total_new_cases,
cm.country,
cm.max_cases
FROM
ContinentSum cs
JOIN
CountryMaxCases cm
ON
cs.continent = cm.continent
AND cs.total_new_cases = cm.max_cases;
This query first calculates the sum of new cases for each continent in the ContinentSum CTE (Common Table Expression). Then, it finds the country with the highest total cases in each continent using the CountryMaxCases CTE. Finally, it joins the results from both CTEs to provide the desired output.
Sample resultset for the shared data.
+---------------+---------------------+--------------+---------------+
| cs.continent | cs.total_new_cases | cm.country | cm.max_cases |
+---------------+---------------------+--------------+---------------+
| Asia | 25.0 | Afghanistan | 25.0 |
+---------------+---------------------+--------------+---------------+