12-03-2015 08:49 AM
I've have a load of case classes which I've used in spark to save data as parquet, e.g.: case class Person(userId: String, technographic: Option[Technographic] = None, geographic: Option[Geographic] = None) case class Technographic(browsers: Seq[Browser], devices: Seq[Device], oss: Seq[Os]) case class Browser(family: String, major: Option[String] = None, language: String ... How can I convert the data on disk back to these case classes? I need to be able to select multiple columns and explode them so that the for each list (e.g. `browsers`) all of the sub lists have the same lengths. E.g. Given this original data: Person(userId="1234", technographic=Some(Technographic(browsers=Seq( Browser(family=Some("IE"), major=Some(7), language=Some("en")), Browser(family=None, major=None, language=Some("en-us")), Browser(family=Some("Firefox), major=None, language=None) )), geographic=Some(Geographic(...)) ) I need, e.g. for the browser data to be as follows (as well as being able to select all columns): family=IE, major=7, language=en family=None, major=None, language=en-us family=Firefox, major=None, language=None which I could get if spark could `explode` each list item. Currently it will just do something like (and anyway `explode` won't work with multiple columns): browsers.family = ["IE", "Firefox"] browsers.major =  browsers.language = ["en", "en-us"]
So how how can I reconstruct a user's record (the entire set of case classes that produced a row of data) from all this nested optional data using spark 1.5.2?