Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to generate the output in specific schema format using apache spark for a given scenario?

Highlighted

How to generate the output in specific schema format using apache spark for a given scenario?

Expert Contributor

Hi All,

 

Scenario:

I am writing a spark code in scala to do profiling  works on structured data (hive table). An input to the code will be always a hive table. I am able to iterate on each column of input table (df.schema.foreach) but not clear on how to  store the profiling result in below format in hive table within same iteration.

 

Result table:

Tablebame,columnname,profilername,profilervalue.

 

This table will have data like

Table1,col1,null_count,20

Table1,col2,unique_count,20

Table1,col1,unque_count,20

 

 

Please remember that every time a new row should be populated in result table (i.e. for each column).

Thanks in advance. Please guide me on how would be the spark code structure?

 

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here