Created 04-10-2025 03:28 PM
Hi, new to Hive and SQL. So please be patient.
I've got a dataset with 2 tables:
1. organisations - contains names of the organisations and an id
2. relationships - contains id of the organisations which organisation is it related to, ie a superior
i've got a query to show the immediate relationship next to each other:
SELECT sub.organisation_id, sub.organisation_name, sup.organisation_id, sup.organisation_name
FROM organisation sup
JOIN relationship
ON sup.organisation_id = relationship_id
JOIN organisation sub
ON organisation_id = relationship_orgid
How do I query this using Hive (or Impala) to display all the organisations in the ancestry? E.g.,
baby -> mother -> grandmother -> great-grandmother -> great-great-grandmother -> etc.
Created 04-10-2025 03:59 PM
@JediBrooker Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our Hive experts @ggangadharan @Shmoo @james_jones who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created 04-10-2025 05:16 PM
Hi @JediBrooker ,
The query that you could use should be something like this:
WITH org_hierarchy_anchor AS (
SELECT
sup.organisation_id AS sup_id,
sup.organisation_name AS sup_name,
sub.organisation_id AS sub_id,
sub.organisation_name AS sub_name,
CAST(sub.organisation_name AS STRING) AS hierarchy_path
FROM organisation sup
JOIN relationship r ON sup.organisation_id = r.relationship_id
JOIN organisation sub ON r.relationship_orgid = sub.organisation_id
),
org_hierarchy_recursive AS (
SELECT
sup_id,
sup_name,
sub_id,
sub_name,
hierarchy_path
FROM org_hierarchy_anchor
UNION ALL
SELECT
oh.sup_id,
oh.sup_name,
s.organisation_id AS sub_id,
s.organisation_name AS sub_name,
CONCAT(oh.hierarchy_path, '->', s.organisation_name) AS hierarchy_path
FROM org_hierarchy_recursive oh
JOIN relationship r ON oh.sub_id = r.relationship_id
JOIN organisation s ON r.relationship_orgid = s.organisation_id
WHERE INSTR(oh.hierarchy_path, CAST(s.organisation_name AS STRING)) = 0
)
SELECT sup_id, sup_name, sub_id, sub_name, hierarchy_path
FROM org_hierarchy_recursive
ORDER BY sup_id, sub_id;
Please, check the column names and table names, I just used what you have provided here.
Let me know if works.
Best,
Cristian Barrueco
Created 04-10-2025 10:44 PM
To provide the exact HQL query, Please share the following :
The above information will help to understand the problem statement better and validate the solution.