Support Questions

Find answers, ask questions, and share your expertise

Hierarchy in Hive

avatar
New Contributor

Hi, new to Hive and SQL. So please be patient.

I've got a dataset with 2 tables:

1. organisations - contains names of the organisations and an id
2. relationships - contains id of the organisations which organisation is it related to, ie a superior

i've got a query to show the immediate relationship next to each other:

SELECT sub.organisation_id, sub.organisation_name, sup.organisation_id, sup.organisation_name
FROM organisation sup
JOIN relationship
ON sup.organisation_id = relationship_id
JOIN organisation sub
ON organisation_id = relationship_orgid

How do I query this using Hive (or Impala) to display all the organisations in the ancestry? E.g.,

baby -> mother -> grandmother -> great-grandmother -> great-great-grandmother -> etc.

 



3 REPLIES 3

avatar
Community Manager

@JediBrooker Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our Hive experts @ggangadharan @Shmoo @james_jones who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Expert Contributor

Hi @JediBrooker ,

The query that you could use should be something like this:

 

   WITH org_hierarchy_anchor AS (
     SELECT 
       sup.organisation_id AS sup_id,
       sup.organisation_name AS sup_name,
       sub.organisation_id AS sub_id,
       sub.organisation_name AS sub_name,
       CAST(sub.organisation_name AS STRING) AS hierarchy_path
     FROM organisation sup
     JOIN relationship r ON sup.organisation_id = r.relationship_id
     JOIN organisation sub ON r.relationship_orgid = sub.organisation_id
   ),
   org_hierarchy_recursive AS (
     SELECT 
       sup_id,
       sup_name,
       sub_id,
       sub_name,
       hierarchy_path
     FROM org_hierarchy_anchor
     UNION ALL
     SELECT 
       oh.sup_id,
       oh.sup_name,
       s.organisation_id AS sub_id,
       s.organisation_name AS sub_name,
       CONCAT(oh.hierarchy_path, '->', s.organisation_name) AS hierarchy_path
     FROM org_hierarchy_recursive oh
     JOIN relationship r ON oh.sub_id = r.relationship_id
     JOIN organisation s ON r.relationship_orgid = s.organisation_id
     WHERE INSTR(oh.hierarchy_path, CAST(s.organisation_name AS STRING)) = 0
   )
   SELECT sup_id, sup_name, sub_id, sub_name, hierarchy_path
   FROM org_hierarchy_recursive
   ORDER BY sup_id, sub_id;

Please, check the column names and table names, I just used what you have provided here.

 

Let me know if works.

 

Best,

Cristian Barrueco 

avatar
Master Collaborator

To provide the exact HQL query, Please share the following :

  • DDL for both the tables
  • Sample records from each table 
  • The expected output based on the sample data. 

The above information will help to understand the problem statement better and validate the solution.