Support Questions

Find answers, ask questions, and share your expertise

Hive query to check mathematical values from multiple tables

avatar
Contributor

Hi all, 

I'm looking to create a query to check and validate the math on certain fields from different tables. I've found examples where the SUM function was used- but I struggled putting it together, and am just trying to find the best way to do these calcuations.

Sample data:

Customer table:

Customer IDMoney Spent
0001500
0002250
0003100

Sales table:

Order IDAmount
5005500
5006250
500750

Check sales math work in progress query(the query doesn't seem to work, returns a "delta" column with a {null} result:

SELECT

CAST(SUM(Customer.Money Spent - Sales.Amount) AS INT) AS DELTA

The desired output that I'd like:

Check table

 Rule NameDelta
Sales Check0
Sales Check0
Sales Check50

Thanks in advance,

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hive can't understand (20) as negative value. You should explicitly mention -20 in column value.

ie., 

+----------------+---------------+---------------+---------------+---------------+
| test.currency  | test.spend_a  | test.spend_b  | test.spend_c  | test.spend_d  |
+----------------+---------------+---------------+---------------+---------------+
| GBP            | 50            | 25            | -20           | 5             |
| JPY            | 50            | 25            | -20           | 5             |
| AUD            | 50            | 25            | -20           | 5             |
+----------------+---------------+---------------+---------------+---------------+

select Currency, (spend_a + spend_b + spend_c + spend_d) as total_spend from test;
+-----------+--------------+
| currency  | total_spend  |
+-----------+--------------+
| GBP       | 60.0         |
| JPY       | 60.0         |
| AUD       | 60.0         |
+-----------+--------------+

View solution in original post

6 REPLIES 6

avatar
Expert Contributor

If you want to join the tables, you should have primary-foreign key relation.

ie., sales table should have customer_id column or customer table should have order_id column to map rows between 2 tables. Otherwise this is not achievable.

avatar
Contributor

Understood- so okay, added Customer ID to Sales table;

 

Customer table:

Customer IDMoney Spent
0001500
0002250
0003100

Sales table:

Customer IDOrder IDAmount
00015005500
00025006250
0003500750

Check sales math work in progress query(the query doesn't seem to work, returns a "delta" column with a {null} result:

SELECT

CAST(SUM(Customer.Money Spent - Sales.Amount) AS INT) AS DELTA

The desired output that I'd like:

Check table

 Rule NameDelta
Sales Check0
Sales Check0
Sales Check50

avatar
Expert Contributor

I dint see the full query in your description. Are you aggregating (SUM) delta column based on customer_id ? 

I did the following simple repro without aggregates & able to get the results as expected.

create external table Customer(customer_id string, money_spent bigint) stored as textfile;
insert into customer values("0001", 500), ("0002", 250), ("0003", 100);

create external table sales(customer_id string, order_id string, amount bigint) stored as textfile;
insert into sales values("0001", "5005", 500), ("0002", "5006", 250), ("0003", "5007", 50);

select "Sales Check" as rule, (a.money_spent - b.amount) as delta from customer a join sales b on a.customer_id = b.customer_id;
+--------------+--------+
| rule | delta |
+--------------+--------+
| Sales Check | 0 |
| Sales Check | 0 |
| Sales Check | 50 |
+--------------+--------+
3 rows selected (15.186 seconds)

Let me know if this helps.

avatar
Contributor

Thanks @nramanaiah. I realized that I gave a bad example before. Apologies.

Table CustCurr:

CurrencySpend_ASpend_BSpend_CSpend_D
GBP5025(20)5
JPY5025(20)5
AUD5025(20)5

The Output that I'd like to get(as a check- can't create temp\tables as i only have read access) is:

Currencytotal_spend_checkA_and_CB_C_D
GBP803010
JPY803010
AUD803010

The query that I'm trying is:

SELECT Currency, SUM(Spend_A + Spend_B, + Spend_C + Spend_D) AS total_spend_check

FROM

CustCurr

WHERE

DateofAction = '2024-05-01'

GROUP BY Currency

Here's the incorrect result that I get with above query:

currencytotal_spend_check
GBPnull
JPYnull
AUDnull

I've tried coalesce, cast(field as int) with no avail.

 

 

avatar
Expert Contributor

Hive can't understand (20) as negative value. You should explicitly mention -20 in column value.

ie., 

+----------------+---------------+---------------+---------------+---------------+
| test.currency  | test.spend_a  | test.spend_b  | test.spend_c  | test.spend_d  |
+----------------+---------------+---------------+---------------+---------------+
| GBP            | 50            | 25            | -20           | 5             |
| JPY            | 50            | 25            | -20           | 5             |
| AUD            | 50            | 25            | -20           | 5             |
+----------------+---------------+---------------+---------------+---------------+

select Currency, (spend_a + spend_b + spend_c + spend_d) as total_spend from test;
+-----------+--------------+
| currency  | total_spend  |
+-----------+--------------+
| GBP       | 60.0         |
| JPY       | 60.0         |
| AUD       | 60.0         |
+-----------+--------------+

avatar
Contributor

Thanks! because i have null values in my data set as well, i used colaesce, and it worked! Your query was the basis though, so thanks again!
Query by @nramanaiah that worked for me, as I have null records in the dataset:

select Currency, (coalesce(spend_a,0)) + (colaesce(spend_b,0)) + coalesce(spend_c,0)) + coalesce(spend_d,0)) as total_spend from test