<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Explode function in Data Frames in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Explode-function-in-Data-Frames/m-p/36300#M5773</link>
    <description>&lt;P&gt;I think I figured this out. &amp;nbsp;In this case, I needed to use the other flavor of explode in the second operation:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;val ydf = xdf.explode("nar", "gname") { nar: Seq[String] =&amp;gt; nar }&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Always happens, as soon as you ask the question publicly...&lt;/P&gt;</description>
    <pubDate>Fri, 15 Jan 2016 15:37:48 GMT</pubDate>
    <dc:creator>bontempi</dc:creator>
    <dc:date>2016-01-15T15:37:48Z</dc:date>
    <item>
      <title>Explode function in Data Frames</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Explode-function-in-Data-Frames/m-p/27045#M5767</link>
      <description>&lt;P&gt;We have a nested parquet file with below sample strucutre..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;department_id&amp;nbsp; String&lt;/P&gt;&lt;P&gt;department_name String&lt;/P&gt;&lt;P&gt;Employees Array&amp;lt;Struct&amp;lt;first_name String, last_name String, email String&amp;gt;&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We want to flatten above structure using explode API of data frames. Whatever samples that we got from the documentation and git is talking about exploding a String by splitting but here we have an Array strucutre. We did not get any examples for this in web also. Or I could be missing something..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could anyone please help in using explode method with nested array strucutre..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I referrerd below API and github links&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame"&gt;https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala"&gt;https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks !&lt;/P&gt;&lt;P&gt;Siva&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:27:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Explode-function-in-Data-Frames/m-p/27045#M5767</guid>
      <dc:creator>SivaBollineni</dc:creator>
      <dc:date>2022-09-16T09:27:58Z</dc:date>
    </item>
    <item>
      <title>Re: Explode function in Data Frames</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Explode-function-in-Data-Frames/m-p/27120#M5768</link>
      <description>&lt;P&gt;Hey Siva-&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is Chris Fregly from Databricks. &amp;nbsp;I just talked to my co-worker, Michael Armbrust (Spark SQL, Catalyst, DataFrame guru), and we came up with the&amp;nbsp;code sample below. &amp;nbsp;Hopefully, this is what you're looking for.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Michael admits that this is a bit verbose, so&amp;nbsp;he may implement a more condense `explodeArray()` method on DataFrame at some point.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;case class Employee(firstName: String, lastName: String, email: String)
case class Department(id: String, name: String)
case class DepartmentWithEmployees(department: Department, employees: Seq[Employee])

val employee1 = new Employee("michael", "armbrust", "abc123@prodigy.net")
val employee2 = new Employee("chris", "fregly", "def456@compuserve.net")

val department1 = new Department("123456", "Engineering")
val department2 = new Department("123456", "Psychology")
val departmentWithEmployees1 = new DepartmentWithEmployees(department1, Seq(employee1, employee2))
val departmentWithEmployees2 = new DepartmentWithEmployees(department2, Seq(employee1, employee2))

val departmentWithEmployeesRDD = sc.parallelize(Seq(departmentWithEmployees1, departmentWithEmployees2))
departmentWithEmployeesRDD.toDF().saveAsParquetFile("dwe.parquet")

val departmentWithEmployeesDF = sqlContext.parquetFile("dwe.parquet")
&lt;BR /&gt;// This would be replaced by explodeArray()
val explodedDepartmentWithEmployeesDF = departmentWithEmployeesDF.explode(departmentWithEmployeesDF("employees")) { 
	case Row(employee: Seq[Row]) =&amp;gt; employee.map(employee =&amp;gt; 
		Employee(employee(0).asInstanceOf[String], employee(1).asInstanceOf[String], employee(2).asInstanceOf[String])
	) 
}&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 07 May 2015 16:04:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Explode-function-in-Data-Frames/m-p/27120#M5768</guid>
      <dc:creator>chrisf</dc:creator>
      <dc:date>2015-05-07T16:04:23Z</dc:date>
    </item>
    <item>
      <title>Re: Explode function in Data Frames</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Explode-function-in-Data-Frames/m-p/27121#M5769</link>
      <description>&lt;P&gt;hi Chris and&amp;nbsp;&lt;SPAN&gt;Michael&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks for your quick response and solution.. Let me try this solution in our data model.. I will update this thread about the outcome.. Many thanks again.. explodeArray would be great option.. Nevetheless this solution is really cool.. Thanks again for you time..&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks !&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Siva&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 07 May 2015 16:12:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Explode-function-in-Data-Frames/m-p/27121#M5769</guid>
      <dc:creator>SivaBollineni</dc:creator>
      <dc:date>2015-05-07T16:12:32Z</dc:date>
    </item>
    <item>
      <title>Re: Explode function in Data Frames</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Explode-function-in-Data-Frames/m-p/32722#M5770</link>
      <description>&lt;P&gt;I have a similar situation, and am having trouble coding it in Scala due to my limited knowledge of Scala.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In my case, the data is read from an Avro file. I used my debugger to find out the structure of the Avro file.&lt;/P&gt;&lt;P&gt;It is pretty complicated as you can see below:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;[Report_Header: struct&amp;lt;Report_Name:string,Report_Date_Time:string,Aircraft_Type_Number:string,Aircraft_Serial_Number:string,Aircraft_Tail_Number:string,Version_Number:string,Tables_Part_Number:string&amp;gt;, PFS: array&amp;lt;struct&amp;lt;PFS_Header:struct&amp;lt;Flight_Leg:string,Flight_Number:string,From:string,Start:string,To:string,End:string&amp;gt;,Flight_Deck_Effects:struct&amp;lt;Count:string,FDE:array&amp;lt;struct&amp;lt;Equation_ID:string,Message_Text:string,Status:string,Occurrences:string,Recurrences:string,Fault_Code:string,Flight_Phase:string,Logged_Date_Time:string,Associated_Fault_Messages:struct&amp;lt;Fault_Message:array&amp;lt;struct&amp;lt;Equation_ID:string,Message_Text:string,Status:string,Occurrences:string,Recurrences:string,Fault_Code:string,Flight_Phase:string,Logged_Date_Time:string,Parameter_Snapshot:string&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;,Uncorrelated_Fault_Messages:struct&amp;lt;Count:string,Fault_Message:array&amp;lt;struct&amp;lt;Equation_ID:string,Message_Text:string,Status:string,Occurrences:string,Recurrences:string,Fault_Code:string,Flight_Phase:string,Logged_Date_Time:string,Parameter_Snapshot:string&amp;gt;&amp;gt;&amp;gt;,Service_Messages:struct&amp;lt;Count:string,Service_Message:array&amp;lt;struct&amp;lt;Equation_ID:string,Message_Text:string,Status:string,Occurrences:string,Recurrences:string,Fault_Code:string,Flight_Phase:string,Logged_Date_Time:string&amp;gt;&amp;gt;&amp;gt;,Aircraft_Servicing:struct&amp;lt;ENGINES:struct&amp;lt;LEFT_ENGINE:struct&amp;lt;Oil_Level:string,Oil_Filt_Bypass:string,Oil_Filt_Impend_Bypass:string,Fuel_Filt_Bypass:string,Fuel_Filt_Impend_Bypass:string&amp;gt;,RIGHT_ENGINE:struct&amp;lt;Oil_Level:string,Oil_Filt_Bypass:string,Oil_Filt_Impend_Bypass:string,Fuel_Filt_Bypass:string,Fuel_Filt_Impend_Bypass:string&amp;gt;&amp;gt;,APU:struct&amp;lt;HOURS:string,CYCLES:string,Oil_Level:string,Lube_Filt_Bypass:string,Generator_Filt_Bypass:string&amp;gt;,TIRE_PRESSURES:struct&amp;lt;Press:struct&amp;lt;LEFT_NLG:string,RIGHT_NLG:string,LOB_MLG:string,LIB_MLG:string,RIB_MLG:string,ROB_MLG:string&amp;gt;,Temp:struct&amp;lt;LEFT_NLG:string,RIGHT_NLG:string,LOB_MLG:string,LIB_MLG:string,RIB_MLG:string,ROB_MLG:string&amp;gt;&amp;gt;,BRAKES:struct&amp;lt;Wear:struct&amp;lt;LOB:string,LIB:string,RIB:string,ROB:string&amp;gt;,Cycles_Prediction:struct&amp;lt;LOB:string,LIB:string,RIB:string,ROB:string&amp;gt;&amp;gt;,HYDRAULICS:struct&amp;lt;Level:struct&amp;lt;SYS1:string,SYS2:string,SYS3:string&amp;gt;,Temperature:struct&amp;lt;SYS1:string,SYS2:string,SYS3:string&amp;gt;&amp;gt;,FUEL_Quantity:struct&amp;lt;LEFT:string,CENTER:string,RIGHT:string,TOTAL:string&amp;gt;,FIDEX:struct&amp;lt;SMOKE_DET_STATUS:struct&amp;lt;Fwd_Cargo_SD1:string,Fwd_Cargo_SD2:string,Fwd_Cargo_SD3:string,Fwd_Cargo_SD4:string,Aft_Cargo_SD1:string,Aft_Cargo_SD2:string,Aft_Cargo_SD3:string,Aft_Cargo_SD4:string,Eqpt_Bay_SD1:string,Eqpt_Bay_SD2:string,IFE_Bay_SD1:string,IFE_Bay_SD2:string,Fwd_Lavatory_SD:string,Lavatory_C_SD:string,Lavatory_D_SD:string,Lavatory_E_SD:string&amp;gt;&amp;gt;,WATER_WASTE:struct&amp;lt;Waste_Tank_Level:string,Potable_Water_Level:string&amp;gt;,CREW_O2:struct&amp;lt;Bottle_Press:string&amp;gt;&amp;gt;&amp;gt;&amp;gt;]&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;While I do not want the entire solution, I just would like some help getting started on this. Let's just take the first structure:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;[Report_Header: struct&amp;lt;Report_Name:string,Report_Date_Time:string,Aircraft_Type_Number:string,Aircraft_Serial_Number:string,Aircraft_Tail_Number:string,Version_Number:string,Tables_Part_Number:string&amp;gt;&lt;/FONT&gt;]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If I were to just expand this into separate columns, what would my explode() function look like?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is my attempt, but I am stuck with the actual implementation of the explode() function.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; case class Report_Header( Report_Name: String,&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Report_Date_Time: String,&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Aircraft_Type_Number: String,&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Aircraft_Serial_Number: String,&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Aircraft_Tail_Number: String,&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Version_Number: String,&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Tables_Part_Number: String)&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; def testAvro(inputFile: String, outputFile: String, context: SparkContext): Unit = {&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val sqlContext = new SQLContext(context)&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val pfsDetailedReport = sqlContext.read&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .format("com.databricks.spark.avro")&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .load(inputFile)&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val explodedPfsDetailedReport = pfsDetailedReport.explode(pfsDetailedReport("Report_Header")) {&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // Stuck here - please help&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; case Row(Report_header: Seq[Row @unchecked]) =&amp;gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; header.map(??? =&amp;gt; ???)&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // Example below taken from above&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // case Row(employee: Seq[Row @unchecked]) =&amp;gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; //&amp;nbsp; employee.map(employee =&amp;gt; Employee(employee(0).asInstanceOf[String], employee(1).asInstanceOf[String], employee(2).asInstanceOf[String]))&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; explodedPfsDetailedReport.write&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .format("com.databricks.spark.avro")&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .save(outputFile)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; }&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;Any help will be highly appreciated.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Oct 2015 18:47:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Explode-function-in-Data-Frames/m-p/32722#M5770</guid>
      <dc:creator>anupambagchi</dc:creator>
      <dc:date>2015-10-06T18:47:10Z</dc:date>
    </item>
    <item>
      <title>Re: Explode function in Data Frames</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Explode-function-in-Data-Frames/m-p/32764#M5771</link>
      <description>&lt;P&gt;To answer my own question, here is a link that may be useful.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/julianpeeters/sbt-avrohugger" target="_blank"&gt;https://github.com/julianpeeters/sbt-avrohugger&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/julianpeeters/avro-scala-macro-annotations" target="_blank"&gt;https://github.com/julianpeeters/avro-scala-macro-annotations&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I short, you need to build a case class for every Avro structure and write an apply() method to convert&lt;/P&gt;&lt;P&gt;a scala Row object to your case class. The links above will help to do the former. The latter has to be&lt;/P&gt;&lt;P&gt;done by hand, but I think it should be possible to generate code for the Row to object conversion as well.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Oct 2015 22:02:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Explode-function-in-Data-Frames/m-p/32764#M5771</guid>
      <dc:creator>anupambagchi</dc:creator>
      <dc:date>2015-10-07T22:02:26Z</dc:date>
    </item>
    <item>
      <title>Re: Explode function in Data Frames</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Explode-function-in-Data-Frames/m-p/36291#M5772</link>
      <description>&lt;P&gt;This info is very helpful, but I've got a twist that I can't seem to figure out.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I got this working for a single level of depth, but I'm somehow challenged (and a Scala noob) by multiple levels. &amp;nbsp;Below is the schema of my DataFrame (modelled after the HL7 FHIR DSTU2 specification, read from a parquet file). &amp;nbsp;I am trying to explode out the individual values in the "given" field of the "name" struct array (so, a nested array), for example, but following the initial explode of the name array, the field I exploded to (called "nar") is not an array of struct, it's simply an array of String, which I think is challenging to the explode() method. &amp;nbsp;I've tried a number of different approaches, but haven't found the right combination. &amp;nbsp;I need to be able to test each individual "given" names (and other values in the struct) against those values in other records (name matching). &amp;nbsp;Do you think this is possible?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;case class Nar(nar: Seq[String])&lt;/P&gt;&lt;P&gt;case class Gname(gname: String)&lt;BR /&gt;val xdf = patdf.explode($"name")&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;{case Row(name: Seq[Row]) =&amp;gt; name.map{name =&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Nar( name( name.fieldIndex("given") ).asInstanceOf[Seq[String]] )}}&lt;/P&gt;&lt;P&gt;val ydf = xdf.explode($"nar")&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;{case Row(nar: Seq[Row]) =&amp;gt; nar.map{nar =&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Gname(nar(0).asInstanceOf[String])}}&lt;/P&gt;&lt;P&gt;ydf.select($"gname").foreach(println)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.sql.Row&lt;BR /&gt;at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1$$anonfun$apply$1.apply(&amp;lt;console&amp;gt;:32)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;root&lt;BR /&gt;&amp;nbsp; &amp;nbsp;|-- PatientCareProvider: array (nullable = true)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;|-- element: string (containsNull = true)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;|-- UniqueId: string (nullable = true)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;|-- active: boolean (nullable = true)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;|-- address: array (nullable = true)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;|-- element: struct (containsNull = true)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;|-- addressLine: array (nullable = true)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;|-- element: string (containsNull = true)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;|-- city: string (nullable = true)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;|-- country: string (nullable = true)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;|-- period: struct (nullable = true)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;|-- startTime: struct (nullable = true)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;|-- value: long (nullable = true)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;|-- postalCode: string (nullable = true)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- region: string (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- text: string (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- use: string (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- birthDate: struct (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- value: long (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- gender: string (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- link: array (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- element: struct (containsNull = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- linkType: string (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- other: struct (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- display: string (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- reference: string (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- managingOrganization: struct (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- display: string (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- reference: string (nullable = true)&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp;|-- name: array (nullable = true)&lt;/STRONG&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- element: struct (containsNull = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- family: array (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- element: string (containsNull = true)&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;|-- given: array (nullable = true)&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;|-- element: string (containsNull = true)&lt;/STRONG&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- period: struct (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- startTime: struct (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- value: long (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- prefix: array (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- element: string (containsNull = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- suffix: array (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- element: string (containsNull = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- text: string (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- use: string (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- resourceId: string (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- telecom: array (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- element: struct (containsNull = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- period: struct (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- startTime: struct (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- value: long (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- system: string (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- use: string (nullable = true)&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;| &lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;|-- value: string (nullable = true)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then with "given" exploded as "nar":&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;|-- nar: array (nullable = true)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;|-- element: string (containsNull = true)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then with "given" exploded as "gname":&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;|-- gname: string (nullable = true)&lt;/P&gt;</description>
      <pubDate>Fri, 15 Jan 2016 12:49:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Explode-function-in-Data-Frames/m-p/36291#M5772</guid>
      <dc:creator>bontempi</dc:creator>
      <dc:date>2016-01-15T12:49:08Z</dc:date>
    </item>
    <item>
      <title>Re: Explode function in Data Frames</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Explode-function-in-Data-Frames/m-p/36300#M5773</link>
      <description>&lt;P&gt;I think I figured this out. &amp;nbsp;In this case, I needed to use the other flavor of explode in the second operation:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;val ydf = xdf.explode("nar", "gname") { nar: Seq[String] =&amp;gt; nar }&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Always happens, as soon as you ask the question publicly...&lt;/P&gt;</description>
      <pubDate>Fri, 15 Jan 2016 15:37:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Explode-function-in-Data-Frames/m-p/36300#M5773</guid>
      <dc:creator>bontempi</dc:creator>
      <dc:date>2016-01-15T15:37:48Z</dc:date>
    </item>
  </channel>
</rss>

