<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re writing Avro map reduce to Parquet map reduce in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Re-writing-Avro-map-reduce-to-Parquet-map-reduce/m-p/34765#M11405</link>
    <description>&lt;P&gt;Hello All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We have a &amp;nbsp;java map reduce application which reads in binary files does some &amp;nbsp;data processing and converts to avro data.Currently we have two avro schemas and use Avromultipleoutputs class to write to multiple locations based on the schema.After we did &amp;nbsp;some &amp;nbsp;research we found that it would be &amp;nbsp; beneficial if we could store the data as parquet.What is the best way to do this?Should I change the native map reduce &amp;nbsp;to convert from avro to parquet or is there some other utility that I can use?.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Nishan&lt;/P&gt;</description>
    <pubDate>Fri, 04 Dec 2015 18:16:35 GMT</pubDate>
    <dc:creator>Nishan</dc:creator>
    <dc:date>2015-12-04T18:16:35Z</dc:date>
    <item>
      <title>Re writing Avro map reduce to Parquet map reduce</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Re-writing-Avro-map-reduce-to-Parquet-map-reduce/m-p/34765#M11405</link>
      <description>&lt;P&gt;Hello All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We have a &amp;nbsp;java map reduce application which reads in binary files does some &amp;nbsp;data processing and converts to avro data.Currently we have two avro schemas and use Avromultipleoutputs class to write to multiple locations based on the schema.After we did &amp;nbsp;some &amp;nbsp;research we found that it would be &amp;nbsp; beneficial if we could store the data as parquet.What is the best way to do this?Should I change the native map reduce &amp;nbsp;to convert from avro to parquet or is there some other utility that I can use?.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Nishan&lt;/P&gt;</description>
      <pubDate>Fri, 04 Dec 2015 18:16:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Re-writing-Avro-map-reduce-to-Parquet-map-reduce/m-p/34765#M11405</guid>
      <dc:creator>Nishan</dc:creator>
      <dc:date>2015-12-04T18:16:35Z</dc:date>
    </item>
    <item>
      <title>Re: Re writing Avro map reduce to Parquet map reduce</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Re-writing-Avro-map-reduce-to-Parquet-map-reduce/m-p/34780#M11406</link>
      <description>&lt;P&gt;I tried using&amp;nbsp;AvroParquetOutputFormat and MultipleOutputs class and was able to generate parquet files for a specific schema type.For the other schema type I am running into the below error.Any help is appreciated?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;java.lang.ArrayIndexOutOfBoundsException: 2820&lt;BR /&gt;at org.apache.parquet.io.api.Binary.hashCode(Binary.java:489)&lt;BR /&gt;at org.apache.parquet.io.api.Binary.access$100(Binary.java:34)&lt;BR /&gt;at org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.hashCode(Binary.java:382)&lt;BR /&gt;at org.apache.parquet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOpenHashMap.getInt(Object2IntLinkedOpenHashMap.java:587)&lt;BR /&gt;at org.apache.parquet.column.values.dictionary.DictionaryValuesWriter$PlainBinaryDictionaryValuesWriter.writeBytes(DictionaryValuesWriter.java:235)&lt;BR /&gt;at org.apache.parquet.column.values.fallback.FallbackValuesWriter.writeBytes(FallbackValuesWriter.java:162)&lt;BR /&gt;at org.apache.parquet.column.impl.ColumnWriterV1.write(ColumnWriterV1.java:203)&lt;BR /&gt;at org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.addBinary(MessageColumnIO.java:347)&lt;BR /&gt;at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:257)&lt;BR /&gt;at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)&lt;BR /&gt;at org.apache.parquet.avro.AvroWriteSupport.writeRecord(AvroWriteSupport.java:149)&lt;BR /&gt;at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:262)&lt;BR /&gt;at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)&lt;BR /&gt;at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)&lt;BR /&gt;at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)&lt;BR /&gt;at org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat$LazyRecordWriter.write(LazyOutputFormat.java:115)&lt;BR /&gt;at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:457)&lt;BR /&gt;at com.visa.dps.mapreduce.logger.LoggerMapper.map(LoggerMapper.java:271)&lt;/P&gt;</description>
      <pubDate>Fri, 04 Dec 2015 20:46:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Re-writing-Avro-map-reduce-to-Parquet-map-reduce/m-p/34780#M11406</guid>
      <dc:creator>Nishan</dc:creator>
      <dc:date>2015-12-04T20:46:28Z</dc:date>
    </item>
  </channel>
</rss>

