Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to explode arrays without losing null values in spark.

How to explode arrays without losing null values in spark.

New Contributor

I have a Dataframe that I am trying to flatten. As part of the process, I want to explode it, so if I have a column of arrays, each value of the array will be used to create a separate row.

 

My dataframe has columns tradeid, tradedate, and schedule.Now Schedule is an array, hence I query the dataframe as below.

hivecontext.sql(select tradeid, tradedate, explode(schedule) from tempDf)

 

With this code I am loosing those trade rows wherein schedule is not present.I have found some solutions online to handle this using dataframe language but that's
not what I want as I have implemented this entire data extraction using sql queries on dataframe.I can't afford to re-develop it using dataframe language(DSL)

 

I think i will need to write a custom explode function to solve that I can use in my sql query.Can someone please help.

The spark version I am using is 1.6