Reply
New Contributor
Posts: 1
Registered: ‎07-08-2017

How to explode arrays without losing null values in spark.

I have a Dataframe that I am trying to flatten. As part of the process, I want to explode it, so if I have a column of arrays, each value of the array will be used to create a separate row.

 

My dataframe has columns tradeid, tradedate, and schedule.Now Schedule is an array, hence I query the dataframe as below.

hivecontext.sql(select tradeid, tradedate, explode(schedule) from tempDf)

 

With this code I am loosing those trade rows wherein schedule is not present.I have found some solutions online to handle this using dataframe language but that's
not what I want as I have implemented this entire data extraction using sql queries on dataframe.I can't afford to re-develop it using dataframe language(DSL)

 

I think i will need to write a custom explode function to solve that I can use in my sql query.Can someone please help.

The spark version I am using is 1.6

Announcements