Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive query to check mathematical values from multiple tables

avatar
Explorer

Hi all, 

I'm looking to create a query to check and validate the math on certain fields from different tables. I've found examples where the SUM function was used- but I struggled putting it together, and am just trying to find the best way to do these calcuations.

Sample data:

Customer table:

Customer IDMoney Spent
0001500
0002250
0003100

Sales table:

Order IDAmount
5005500
5006250
500750

Check sales math work in progress query(the query doesn't seem to work, returns a "delta" column with a {null} result:

SELECT

CAST(SUM(Customer.Money Spent - Sales.Amount) AS INT) AS DELTA

The desired output that I'd like:

Check table

 Rule NameDelta
Sales Check0
Sales Check0
Sales Check50

Thanks in advance,

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hive can't understand (20) as negative value. You should explicitly mention -20 in column value.

ie., 

+----------------+---------------+---------------+---------------+---------------+
| test.currency  | test.spend_a  | test.spend_b  | test.spend_c  | test.spend_d  |
+----------------+---------------+---------------+---------------+---------------+
| GBP            | 50            | 25            | -20           | 5             |
| JPY            | 50            | 25            | -20           | 5             |
| AUD            | 50            | 25            | -20           | 5             |
+----------------+---------------+---------------+---------------+---------------+

select Currency, (spend_a + spend_b + spend_c + spend_d) as total_spend from test;
+-----------+--------------+
| currency  | total_spend  |
+-----------+--------------+
| GBP       | 60.0         |
| JPY       | 60.0         |
| AUD       | 60.0         |
+-----------+--------------+

View solution in original post

6 REPLIES 6

avatar
Expert Contributor

If you want to join the tables, you should have primary-foreign key relation.

ie., sales table should have customer_id column or customer table should have order_id column to map rows between 2 tables. Otherwise this is not achievable.

avatar
Explorer

Understood- so okay, added Customer ID to Sales table;

 

Customer table:

Customer IDMoney Spent
0001500
0002250
0003100

Sales table:

Customer IDOrder IDAmount
00015005500
00025006250
0003500750

Check sales math work in progress query(the query doesn't seem to work, returns a "delta" column with a {null} result:

SELECT

CAST(SUM(Customer.Money Spent - Sales.Amount) AS INT) AS DELTA

The desired output that I'd like:

Check table

 Rule NameDelta
Sales Check0
Sales Check0
Sales Check50

avatar
Expert Contributor

I dint see the full query in your description. Are you aggregating (SUM) delta column based on customer_id ? 

I did the following simple repro without aggregates & able to get the results as expected.

create external table Customer(customer_id string, money_spent bigint) stored as textfile;
insert into customer values("0001", 500), ("0002", 250), ("0003", 100);

create external table sales(customer_id string, order_id string, amount bigint) stored as textfile;
insert into sales values("0001", "5005", 500), ("0002", "5006", 250), ("0003", "5007", 50);

select "Sales Check" as rule, (a.money_spent - b.amount) as delta from customer a join sales b on a.customer_id = b.customer_id;
+--------------+--------+
| rule | delta |
+--------------+--------+
| Sales Check | 0 |
| Sales Check | 0 |
| Sales Check | 50 |
+--------------+--------+
3 rows selected (15.186 seconds)

Let me know if this helps.

avatar
Explorer

Thanks @nramanaiah. I realized that I gave a bad example before. Apologies.

Table CustCurr:

CurrencySpend_ASpend_BSpend_CSpend_D
GBP5025(20)5
JPY5025(20)5
AUD5025(20)5

The Output that I'd like to get(as a check- can't create temp\tables as i only have read access) is:

Currencytotal_spend_checkA_and_CB_C_D
GBP803010
JPY803010
AUD803010

The query that I'm trying is:

SELECT Currency, SUM(Spend_A + Spend_B, + Spend_C + Spend_D) AS total_spend_check

FROM

CustCurr

WHERE

DateofAction = '2024-05-01'

GROUP BY Currency

Here's the incorrect result that I get with above query:

currencytotal_spend_check
GBPnull
JPYnull
AUDnull

I've tried coalesce, cast(field as int) with no avail.

 

 

avatar
Expert Contributor

Hive can't understand (20) as negative value. You should explicitly mention -20 in column value.

ie., 

+----------------+---------------+---------------+---------------+---------------+
| test.currency  | test.spend_a  | test.spend_b  | test.spend_c  | test.spend_d  |
+----------------+---------------+---------------+---------------+---------------+
| GBP            | 50            | 25            | -20           | 5             |
| JPY            | 50            | 25            | -20           | 5             |
| AUD            | 50            | 25            | -20           | 5             |
+----------------+---------------+---------------+---------------+---------------+

select Currency, (spend_a + spend_b + spend_c + spend_d) as total_spend from test;
+-----------+--------------+
| currency  | total_spend  |
+-----------+--------------+
| GBP       | 60.0         |
| JPY       | 60.0         |
| AUD       | 60.0         |
+-----------+--------------+

avatar
Explorer

Thanks! because i have null values in my data set as well, i used colaesce, and it worked! Your query was the basis though, so thanks again!
Query by @nramanaiah that worked for me, as I have null records in the dataset:

select Currency, (coalesce(spend_a,0)) + (colaesce(spend_b,0)) + coalesce(spend_c,0)) + coalesce(spend_d,0)) as total_spend from test