id name 1 sim 2 sam 3 jas
id rating 2 7 2 6 3 8 3 7 1 9
Now, what I need is number of rows that were grouped using group by in both the tables i.e table A and table B. .
Something like this:
select t1.id, count(t2.key) as a, count(t3.id) as b FROM table1 t1 LEFT JOIN table2 t2 ON t1.id = t2.key LEFT JOIN table3 t3 ON t1.id = t3.id GROUP BY t1.key, t2.id
Result I expect:
id a b 1 2 1 2 1 2 3 1 2
@Simran Kaur Try using Pig.
If you are unable to do so in a single query in hive do so in multiple step by step simple statements in PIG. Pig will internally compact and tune the statements.
You should be able to do this using subqueries (caveat: I haven't tried this directly in Hive, but it works in Postgres):
select tbl1.id, tbl2.age_count, tbl3.rating_count, from users as tbl1, (select i1.key, count(i1.age) age_count from age as i1 group by i1.key) as tbl2, (select i2.id, count(i2.rating) rating_count from rating as i2 group by i2.id) as tbl2 where tbl1.id = tbl2.key and tbl1.id = tbl3.id
You can get the result by doing this. its a simple aggregation logic.
select (id), sum(a), sum(b) from ( select id, count(*) a, 0 b from t1, t2 where t1.id = t2.key group by id union select t1.id, 0 a, count(*) b from t1, t3 where t1.id=t3.id group by t1.id )c group by id order by id;
This logic will work on Hive, SQL, and its faster as it performs just sum and counts and unions