<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Writing avro files with a user defined schema in spark in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Writing-avro-files-with-a-user-defined-schema-in-spark/m-p/228073#M61001</link>
    <description>&lt;P&gt;A simple Spark app demonstrating how to read and write data in the Parquet and Avro
formats.&lt;/P&gt;&lt;P&gt;I hope this can help. &lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/sryza/simplesparkavroapp" target="_blank"&gt;https://github.com/sryza/simplesparkavroapp&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 16 May 2017 06:41:27 GMT</pubDate>
    <dc:creator>abbtah</dc:creator>
    <dc:date>2017-05-16T06:41:27Z</dc:date>
    <item>
      <title>Writing avro files with a user defined schema in spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Writing-avro-files-with-a-user-defined-schema-in-spark/m-p/228072#M61000</link>
      <description>&lt;P&gt;I have a corpus of structured data stored in HDFS as a set of avro files. I need to do some processing to split this set into multiple sets based on the value of a certain field within the data set. This will involve splitting out the individual records based on the data element, bundling them up as new avro files and storing them into separate directories. I have tested a solution with Spark (v2.1.0) using the databricks spark-avro library (v2_11.3.2.0). It performs well, but when I write the data set into new avro files, it applies a spark-avro generated schema. The data types match, but I miss out an certain schema customizations, such as default values and descriptions. &lt;/P&gt;&lt;P&gt;Has anyone successfully applied a user-defined schema when writing avro files with spark-avro (or another similar Spark library)? I have found surprisingly little while searching.&lt;/P&gt;</description>
      <pubDate>Thu, 11 May 2017 23:12:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Writing-avro-files-with-a-user-defined-schema-in-spark/m-p/228072#M61000</guid>
      <dc:creator>nicholas_pettyj</dc:creator>
      <dc:date>2017-05-11T23:12:15Z</dc:date>
    </item>
    <item>
      <title>Re: Writing avro files with a user defined schema in spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Writing-avro-files-with-a-user-defined-schema-in-spark/m-p/228073#M61001</link>
      <description>&lt;P&gt;A simple Spark app demonstrating how to read and write data in the Parquet and Avro
formats.&lt;/P&gt;&lt;P&gt;I hope this can help. &lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/sryza/simplesparkavroapp" target="_blank"&gt;https://github.com/sryza/simplesparkavroapp&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 16 May 2017 06:41:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Writing-avro-files-with-a-user-defined-schema-in-spark/m-p/228073#M61001</guid>
      <dc:creator>abbtah</dc:creator>
      <dc:date>2017-05-16T06:41:27Z</dc:date>
    </item>
  </channel>
</rss>

