Support Questions
Find answers, ask questions, and share your expertise

Which is better? Joining two tables using HQL (embedded in Spark Java) or use Java program construct

Which is better? Joining two tables using HQL (embedded in Spark Java) or use Java program construct

New Contributor

For e.g, I need to join two tables on one/more columns. This can be done entirely using HQL or using Spark Java where the Java program does the join using its own constructs. Can any one please explain which approach is better in terms of performance, memory utilization etc.

 

As an example, consider table r1 with primary key ITEM_ID (with billion Records):

(ITEM_ID, ITEM_NAME, ITEM_UNIT, COMPANY_ID)

and table r2 with primary key COMPANY_ID (with few hundred records):

(COMPANY_ID, COMPANY_NAME, COMPANY_CITY)

I want to join r1 and r2 using COMPANY_ID