Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive tables group by operation help with spark


Hive tables group by operation help with spark

New Contributor
I am very new to the bigdata world. I have around 16 tables that needs to be grouped and a report needs to be sent to another system.
The below is an example.Require guidence/help for this task

empno first-name last-name
0        fname     lname
1        fname1    lname1

empno dept-no dept-code
01        a
01        b
11        a
12        a

empno history-no address
01            xyz
02            abc
111231245613            a12

I have to generate a file combining all the tables for each employee, and the average emp-count is 200k

Desired output:

seg-start emp-0
seg-emp 0-fname-lname
seg-dept 0-1-a
seg-dept 0-1-b
seg-his 0-1-xyz
seg-his 0-2-abc
seg-end emp-0
seg-start emp-1...... 

seg-end emp-1

Don't have an account?
Coming from Hortonworks? Activate your account here