<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Parsing XML in Spark RDD in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Parsing-XML-in-Spark-RDD/m-p/152812#M48840</link>
    <description>&lt;P&gt;Do I need to add the data bricks package in spark class path? As I am new to spark so struggling to understand how to use the package. Also are there any other way without using the databricks package for parsing an XML and generating a CSV?&lt;/P&gt;</description>
    <pubDate>Thu, 15 Dec 2016 13:58:38 GMT</pubDate>
    <dc:creator>rajdip_chaudhur</dc:creator>
    <dc:date>2016-12-15T13:58:38Z</dc:date>
    <item>
      <title>Parsing XML in Spark RDD</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Parsing-XML-in-Spark-RDD/m-p/152810#M48838</link>
      <description>&lt;P&gt;Hi Guys,
We have a use cases to parse XML files using Spark RDD. Got some examples to use spark xml utils as per the link.
&lt;A href="https://github.com/databricks/spark-xml" target="_blank"&gt;https://github.com/databricks/spark-xml&lt;/A&gt; &lt;/P&gt;&lt;P&gt;There are some examples here. However can you guys also provide some sample code for this?
Also can you please mention the how and external package can be added from spark-shell and pyspark?
We are looking for your guidance.
Thanks,
Rajdip&lt;/P&gt;</description>
      <pubDate>Wed, 14 Dec 2016 19:42:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Parsing-XML-in-Spark-RDD/m-p/152810#M48838</guid>
      <dc:creator>rajdip_chaudhur</dc:creator>
      <dc:date>2016-12-14T19:42:55Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing XML in Spark RDD</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Parsing-XML-in-Spark-RDD/m-p/152811#M48839</link>
      <description>&lt;P&gt;see:   &lt;A href="https://github.com/databricks/spark-xml" target="_blank"&gt;https://github.com/databricks/spark-xml&lt;/A&gt;&lt;/P&gt;&lt;H3&gt;Github has examples like&lt;/H3&gt;&lt;PRE&gt;import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)
val df = sqlContext.read
    .format("com.databricks.spark.xml")
    .option("rowTag", "book")
    .load("books.xml")

val selectedData = df.select("author", "_id")
selectedData.write
    .format("com.databricks.spark.xml")
    .option("rootTag", "books")
    .option("rowTag", "book") &lt;/PRE&gt;&lt;P&gt;    .save("newbooks.xml")&lt;/P&gt;&lt;H3&gt;Spark compiled with Scala 2.10&lt;/H3&gt;&lt;PRE&gt;&lt;CODE&gt;$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-xml_2.10:0.4.1&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 15 Dec 2016 04:48:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Parsing-XML-in-Spark-RDD/m-p/152811#M48839</guid>
      <dc:creator>TimothySpann</dc:creator>
      <dc:date>2016-12-15T04:48:07Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing XML in Spark RDD</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Parsing-XML-in-Spark-RDD/m-p/152812#M48840</link>
      <description>&lt;P&gt;Do I need to add the data bricks package in spark class path? As I am new to spark so struggling to understand how to use the package. Also are there any other way without using the databricks package for parsing an XML and generating a CSV?&lt;/P&gt;</description>
      <pubDate>Thu, 15 Dec 2016 13:58:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Parsing-XML-in-Spark-RDD/m-p/152812#M48840</guid>
      <dc:creator>rajdip_chaudhur</dc:creator>
      <dc:date>2016-12-15T13:58:38Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing XML in Spark RDD</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Parsing-XML-in-Spark-RDD/m-p/152813#M48841</link>
      <description>&lt;P&gt;Like mentioned in the answer the command line to add the package to your job is&lt;/P&gt;&lt;PRE&gt;$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-xml_2.10:0.4.1
&lt;/PRE&gt;&lt;P&gt;Of course to write your project code you will also need to add this package to your project maven pom dependency. If you build an uber jar for your project that includes this package then you dont need to change your command line for submission.&lt;/P&gt;&lt;P&gt;There are many packages for spark that you can check at spark-packages.org.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Dec 2016 04:02:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Parsing-XML-in-Spark-RDD/m-p/152813#M48841</guid>
      <dc:creator>bikas</dc:creator>
      <dc:date>2016-12-16T04:02:51Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing XML in Spark RDD</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Parsing-XML-in-Spark-RDD/m-p/152814#M48842</link>
      <description>&lt;P&gt;@Timothy Spann....&lt;/P&gt;&lt;P&gt;do we not have a solution to parse/read xml without databricks package? I work on HDP 2.0+,Spark2.1 version.&lt;/P&gt;&lt;P&gt;I am trying to parse xml using pyspark code; manual parsing but I am having difficulty -when converting the list to a dataframe.&lt;/P&gt;&lt;P&gt;Any advice? Let me know; I can post the script here.&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;,&lt;P&gt;@Timothy Spann....&lt;/P&gt;&lt;P&gt;do we not have a solution to parse/read xml without databricks package? I work on HDP 2.0+,Spark2.1 version.&lt;/P&gt;&lt;P&gt;I am trying to parse xml using pyspark code; manual parsing but I am having difficulty -when converting the list to a dataframe.&lt;/P&gt;&lt;P&gt;Any advice? Let me know; I can post the script here.&lt;/P&gt;&lt;P&gt;Thanks.&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 08 Feb 2019 04:47:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Parsing-XML-in-Spark-RDD/m-p/152814#M48842</guid>
      <dc:creator>powerofk76</dc:creator>
      <dc:date>2019-02-08T04:47:45Z</dc:date>
    </item>
  </channel>
</rss>

