Support Questions
Find answers, ask questions, and share your expertise

using count and joins across multiple tables

Highlighted

using count and joins across multiple tables

Expert Contributor

Example:

Table 1:

    id     name     
    1        sim    
    2        sam
    3        jas 

Table 2

key age

1 10

1 20

2 40

3 10

Table 3:

id         rating     
2            7     
2           6     
3          8     
3          7     
1         9 

Now, what I need is number of rows that were grouped using group by in both the tables i.e table A and table B. .

Something like this:

select t1.id, count(t2.key) as a, count(t3.id) as b
    FROM 
    table1 t1
    LEFT JOIN 
    table2 t2
    ON 
    t1.id = t2.key
    LEFT JOIN 
    table3 t3
    ON 
    t1.id = t3.id
    GROUP BY t1.key, t2.id

Result I expect:

    id      a     b 
    1       2     1
    2       1      2 
    3       1       2
6 REPLIES 6
Highlighted

Re: using count and joins across multiple tables

@Simran Kaur Try using Pig.

https://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#JOIN+%28outer%29

If you are unable to do so in a single query in hive do so in multiple step by step simple statements in PIG. Pig will internally compact and tune the statements.

Highlighted

Re: using count and joins across multiple tables

Expert Contributor

@rbiswas: I would first like to know if it's possible to do it in hive at all

Highlighted

Re: using count and joins across multiple tables

Highlighted

Re: using count and joins across multiple tables

Explorer

You should be able to do this using subqueries (caveat: I haven't tried this directly in Hive, but it works in Postgres):

select tbl1.id,
       tbl2.age_count,
       tbl3.rating_count,
  from users as tbl1,
       (select i1.key, count(i1.age) age_count from age as i1 group by i1.key) as tbl2,
       (select i2.id, count(i2.rating) rating_count from rating as i2 group by i2.id) as tbl2
 where tbl1.id = tbl2.key
   and tbl1.id = tbl3.id
Highlighted

Re: using count and joins across multiple tables

Expert Contributor

@Simran

You can get the result by doing this. its a simple aggregation logic.

select (id), sum(a), sum(b) from
(
  select id, count(*) a, 0 b from t1, t2
where t1.id = t2.key
group by id
union 
select t1.id, 0 a, count(*) b from t1, t3
where t1.id=t3.id
group by t1.id
)c
group by id
order by id;
Highlighted

Re: using count and joins across multiple tables

Expert Contributor

This logic will work on Hive, SQL, and its faster as it performs just sum and counts and unions