07-08-2017 11:59 PM
I have a Dataframe that I am trying to flatten. As part of the process, I want to explode it, so if I have a column of arrays, each value of the array will be used to create a separate row.
My dataframe has columns tradeid, tradedate, and schedule.Now Schedule is an array, hence I query the dataframe as below.
hivecontext.sql(select tradeid, tradedate, explode(schedule) from tempDf)
With this code I am loosing those trade rows wherein schedule is not present.I have found some solutions online to handle this using dataframe language but that's
not what I want as I have implemented this entire data extraction using sql queries on dataframe.I can't afford to re-develop it using dataframe language(DSL)
I think i will need to write a custom explode function to solve that I can use in my sql query.Can someone please help.
The spark version I am using is 1.6