Member since
01-15-2016
2
Posts
0
Kudos Received
0
Solutions
01-15-2016
07:37 AM
I think I figured this out. In this case, I needed to use the other flavor of explode in the second operation: val ydf = xdf.explode("nar", "gname") { nar: Seq[String] => nar } Always happens, as soon as you ask the question publicly...
... View more
01-15-2016
04:49 AM
This info is very helpful, but I've got a twist that I can't seem to figure out. I got this working for a single level of depth, but I'm somehow challenged (and a Scala noob) by multiple levels. Below is the schema of my DataFrame (modelled after the HL7 FHIR DSTU2 specification, read from a parquet file). I am trying to explode out the individual values in the "given" field of the "name" struct array (so, a nested array), for example, but following the initial explode of the name array, the field I exploded to (called "nar") is not an array of struct, it's simply an array of String, which I think is challenging to the explode() method. I've tried a number of different approaches, but haven't found the right combination. I need to be able to test each individual "given" names (and other values in the struct) against those values in other records (name matching). Do you think this is possible? case class Nar(nar: Seq[String]) case class Gname(gname: String) val xdf = patdf.explode($"name") {case Row(name: Seq[Row]) => name.map{name => Nar( name( name.fieldIndex("given") ).asInstanceOf[Seq[String]] )}} val ydf = xdf.explode($"nar") {case Row(nar: Seq[Row]) => nar.map{nar => Gname(nar(0).asInstanceOf[String])}} ydf.select($"gname").foreach(println) ... Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.sql.Row at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1$$anonfun$apply$1.apply(<console>:32) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ... root |-- PatientCareProvider: array (nullable = true) | |-- element: string (containsNull = true) |-- UniqueId: string (nullable = true) |-- active: boolean (nullable = true) |-- address: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- addressLine: array (nullable = true) | | | |-- element: string (containsNull = true) | | |-- city: string (nullable = true) | | |-- country: string (nullable = true) | | |-- period: struct (nullable = true) | | | |-- startTime: struct (nullable = true) | | | | |-- value: long (nullable = true) | | |-- postalCode: string (nullable = true) | | |-- region: string (nullable = true) | | |-- text: string (nullable = true) | | |-- use: string (nullable = true) |-- birthDate: struct (nullable = true) | |-- value: long (nullable = true) |-- gender: string (nullable = true) |-- link: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- linkType: string (nullable = true) | | |-- other: struct (nullable = true) | | | |-- display: string (nullable = true) | | | |-- reference: string (nullable = true) |-- managingOrganization: struct (nullable = true) | |-- display: string (nullable = true) | |-- reference: string (nullable = true) |-- name: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- family: array (nullable = true) | | | |-- element: string (containsNull = true) | | |-- given: array (nullable = true) | | | |-- element: string (containsNull = true) | | |-- period: struct (nullable = true) | | | |-- startTime: struct (nullable = true) | | | | |-- value: long (nullable = true) | | |-- prefix: array (nullable = true) | | | |-- element: string (containsNull = true) | | |-- suffix: array (nullable = true) | | | |-- element: string (containsNull = true) | | |-- text: string (nullable = true) | | |-- use: string (nullable = true) |-- resourceId: string (nullable = true) |-- telecom: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- period: struct (nullable = true) | | | |-- startTime: struct (nullable = true) | | | | |-- value: long (nullable = true) | | |-- system: string (nullable = true) | | |-- use: string (nullable = true) | | |-- value: string (nullable = true) Then with "given" exploded as "nar": |-- nar: array (nullable = true) | |-- element: string (containsNull = true) Then with "given" exploded as "gname": |-- gname: string (nullable = true)
... View more