For e.g, I need to join two tables on one/more columns. This can be done entirely using HQL or using Spark Java where the Java program does the join using its own constructs. Can any one please explain which approach is better in terms of performance, memory utilization etc.
As an example, consider table r1 with primary key ITEM_ID (with billion Records):
(ITEM_ID, ITEM_NAME, ITEM_UNIT, COMPANY_ID)
and table r2 with primary key COMPANY_ID (with few hundred records):
(COMPANY_ID, COMPANY_NAME, COMPANY_CITY)
I want to join r1 and r2 using COMPANY_ID
... View more