Support Questions

Find answers, ask questions, and share your expertise
Announcements
Now Live: Explore expert insights and technical deep dives on the new Cloudera Community BlogsRead the Announcement

Hierarchy in Hive

avatar
New Member

Hi, new to Hive and SQL. So please be patient.

I've got a dataset with 2 tables:

1. organisations - contains names of the organisations and an id
2. relationships - contains id of the organisations which organisation is it related to, ie a superior

i've got a query to show the immediate relationship next to each other:

SELECT sub.organisation_id, sub.organisation_name, sup.organisation_id, sup.organisation_name
FROM organisation sup
JOIN relationship
ON sup.organisation_id = relationship_id
JOIN organisation sub
ON organisation_id = relationship_orgid

How do I query this using Hive (or Impala) to display all the organisations in the ancestry? E.g.,

baby -> mother -> grandmother -> great-grandmother -> great-great-grandmother -> etc.

 



3 REPLIES 3

avatar
Community Manager

@JediBrooker Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our Hive experts @ggangadharan @Shmoo @james_jones who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Senior Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Expert Contributor

Hi @JediBrooker ,

The query that you could use should be something like this:

 

   WITH org_hierarchy_anchor AS (
     SELECT 
       sup.organisation_id AS sup_id,
       sup.organisation_name AS sup_name,
       sub.organisation_id AS sub_id,
       sub.organisation_name AS sub_name,
       CAST(sub.organisation_name AS STRING) AS hierarchy_path
     FROM organisation sup
     JOIN relationship r ON sup.organisation_id = r.relationship_id
     JOIN organisation sub ON r.relationship_orgid = sub.organisation_id
   ),
   org_hierarchy_recursive AS (
     SELECT 
       sup_id,
       sup_name,
       sub_id,
       sub_name,
       hierarchy_path
     FROM org_hierarchy_anchor
     UNION ALL
     SELECT 
       oh.sup_id,
       oh.sup_name,
       s.organisation_id AS sub_id,
       s.organisation_name AS sub_name,
       CONCAT(oh.hierarchy_path, '->', s.organisation_name) AS hierarchy_path
     FROM org_hierarchy_recursive oh
     JOIN relationship r ON oh.sub_id = r.relationship_id
     JOIN organisation s ON r.relationship_orgid = s.organisation_id
     WHERE INSTR(oh.hierarchy_path, CAST(s.organisation_name AS STRING)) = 0
   )
   SELECT sup_id, sup_name, sub_id, sub_name, hierarchy_path
   FROM org_hierarchy_recursive
   ORDER BY sup_id, sub_id;

Please, check the column names and table names, I just used what you have provided here.

 

Let me know if works.

 

Best,

Cristian Barrueco 

avatar
Master Collaborator

To provide the exact HQL query, Please share the following :

  • DDL for both the tables
  • Sample records from each table 
  • The expected output based on the sample data. 

The above information will help to understand the problem statement better and validate the solution.