Support Questions
Find answers, ask questions, and share your expertise

finding items that customers bought together from transactions

Highlighted

finding items that customers bought together from transactions

New Contributor

I am working on mysql but need to replicate some queries on hive.

I have a table in this form transaction table

7900-capture.jpg

I would like to retrieve the following info: Resultant table

7901-capture1.jpg

On mysql, the below query works:

<code>SELECT c.original_item_id, c.bought_with_item_id, count(*)as times_bought_together
FROM(SELECT a.item_id as original_item_id, b.item_id as bought_with_item_id
  FROM items a
  INNERjoin items b
  ON a.transaction_id = b.transaction_id AND a.item_id != b.item_id where original_item_id in('B','C')) c
GROUPBY c.original_item_id, c.bought_with_item_id;
<code>
<em>
But I am not able to translate this to a hive query, I have tried lot of shuffling joins and replacing where with on conditions but have not got the necessary results. Would be great if I can find some help on this</em>
3 REPLIES 3
Highlighted

Re: finding items that customers bought together from transactions

Super Guru

@Vivek S when you built you new hive query and ran it what error did you get?

Highlighted

Re: finding items that customers bought together from transactions

@Vivek S

try this:

SELECT a.item_id as item_A, b.item_id as item_B , count(1) as count_pair
FROM items a
INNER JOIN items b ON a.transaction_id = b.transaction_id
WHERE a.item_id > b.item_id
GROUP BY a.item_id, b.item_id
HAVING count(1) > 1

Re: finding items that customers bought together from transactions

Hi @Vivek S. The issue you are having with your query is that Hive does not support non-equi-joins.

So to accomplish your query you need to move the non-equi-join clause out of the INNER JOIN section, and do it as part of the Where clause instead.

So here is the correct query:

SELECT c.original_item_id, c.bought_with_item_id, count(*) as times_bought_together
FROM
   (SELECT a.item_id as original_item_id, b.item_id as bought_with_item_id
    FROM   items a
      INNER join items b
        ON a.transaction_id = b.transaction_id  
    WHERE  a.item_id in('B','C')
      AND  a.item_id != b.item_id
   ) c
GROUP BY c.original_item_id, c.bought_with_item_id;

which gives the expected results:

+---------------------+------------------------+------------------------+--+
| c.original_item_id  | c.bought_with_item_id  | times_bought_together  |
+---------------------+------------------------+------------------------+--+
| B                   | A                      | 2                      |
| B                   | C                      | 2                      |
| C                   | A                      | 1                      |
| C                   | B                      | 2                      |
+---------------------+------------------------+------------------------+--+
4 rows selected (1.48 seconds)

Hope this helps.

Don't have an account?