<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: read a AVRO file stored in HDFS in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127965#M18151</link>
    <description>&lt;P&gt;im trying to write sample java code... but &lt;/P&gt;&lt;P&gt;&lt;A href="https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/conf/Configuration.html" target="_blank"&gt;https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/conf/Configuration.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;[root@sandbox deploy-4]# find / -name core-default.xml&lt;/P&gt;&lt;P&gt;
[root@sandbox deploy-4]# find / -name core-site..xml&lt;/P&gt;&lt;P&gt;there are no such a files in sandbox. How can i go thru this step?&lt;/P&gt;&lt;P&gt;thanks&lt;/P&gt;</description>
    <pubDate>Fri, 05 Feb 2016 22:06:37 GMT</pubDate>
    <dc:creator>lenovomi</dc:creator>
    <dc:date>2016-02-05T22:06:37Z</dc:date>
    <item>
      <title>read a AVRO file stored in HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127959#M18145</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I want to read a metadata from avro file stored in HDFS using AVRO api (      &lt;A href="https://avro.apache.org/docs/1.4.1/api/java/org/apache/avro/file/DataFileReader.html"&gt;https://avro.apache.org/docs/1.4.1/api/java/org/apache/avro/file/DataFileReader.html&lt;/A&gt; )&lt;/P&gt;&lt;P&gt;The avro DataFileReader accepts only File objects. Is it somehow 
possible to read data from file stored on hdfs instead of data stored on
 local fs?&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2016 19:36:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127959#M18145</guid>
      <dc:creator>lenovomi</dc:creator>
      <dc:date>2016-02-05T19:36:14Z</dc:date>
    </item>
    <item>
      <title>Re: read a AVRO file stored in HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127960#M18146</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1997/lenovomi.html" nodeid="1997"&gt;@John Smith&lt;/A&gt;can you clarify, are you trying to do this programmatically using Java or in a pig script? You can look up schema using avro tools and pass getschema flag &lt;A href="http://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/"&gt;Link&lt;/A&gt;. I once kept schema in hdfs as XML but it can be any format even json ouut of avro tools and then process new records. Maybe what you suggest is better, to get schema. You can probably try reading it and passing hdfs scheme rather than file:///&lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2016 20:04:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127960#M18146</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-05T20:04:05Z</dc:date>
    </item>
    <item>
      <title>Re: read a AVRO file stored in HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127961#M18147</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;im trying to do this as part of the JAVA programme. &lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2016 21:05:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127961#M18147</guid>
      <dc:creator>lenovomi</dc:creator>
      <dc:date>2016-02-05T21:05:45Z</dc:date>
    </item>
    <item>
      <title>Re: read a AVRO file stored in HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127962#M18148</link>
      <description>&lt;P&gt;can you call &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;avro-tools-1.7.4.jar &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;within the pig script? and also is it possible to access files stored on HDFS using avro-tools?&lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2016 21:08:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127962#M18148</guid>
      <dc:creator>lenovomi</dc:creator>
      <dc:date>2016-02-05T21:08:17Z</dc:date>
    </item>
    <item>
      <title>Re: read a AVRO file stored in HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127963#M18149</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1997/lenovomi.html" nodeid="1997"&gt;@John Smith&lt;/A&gt; those are all valid questions :), I haven't tried as there was never a need. Try it out, post an article! As far as accessing from Pig, not sure that's possible? Again, try it out. You might be able to look at source code and write a UDF that does what avro-tools tries to do, I don't know. By the way, avro-tools coincides with the version of avro, so I'd suggest downloading the latest avro-tools available, which at this moment is 1.8.0.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2016 21:36:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127963#M18149</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-05T21:36:35Z</dc:date>
    </item>
    <item>
      <title>Re: read a AVRO file stored in HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127964#M18150</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1997/lenovomi.html" nodeid="1997"&gt;@John Smith&lt;/A&gt; then look at how to infer schema in &lt;A href="http://avro.apache.org/docs/1.8.0/gettingstartedjava.html"&gt;Java&lt;/A&gt; API. You don't need avro-tools in that case.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2016 21:37:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127964#M18150</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-05T21:37:56Z</dc:date>
    </item>
    <item>
      <title>Re: read a AVRO file stored in HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127965#M18151</link>
      <description>&lt;P&gt;im trying to write sample java code... but &lt;/P&gt;&lt;P&gt;&lt;A href="https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/conf/Configuration.html" target="_blank"&gt;https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/conf/Configuration.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;[root@sandbox deploy-4]# find / -name core-default.xml&lt;/P&gt;&lt;P&gt;
[root@sandbox deploy-4]# find / -name core-site..xml&lt;/P&gt;&lt;P&gt;there are no such a files in sandbox. How can i go thru this step?&lt;/P&gt;&lt;P&gt;thanks&lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2016 22:06:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127965#M18151</guid>
      <dc:creator>lenovomi</dc:creator>
      <dc:date>2016-02-05T22:06:37Z</dc:date>
    </item>
    <item>
      <title>Re: read a AVRO file stored in HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127966#M18152</link>
      <description>&lt;P&gt;I created sample code, it works FINE.&lt;/P&gt;&lt;PRE&gt;BufferedInputStream inStream = null;
String inputF = "hdfs://CustomerData-20160128-1501807.avro";
org.apache.hadoop.fs.Path inPath = new org.apache.hadoop.fs.Path(inputF);
try {
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://sandbox.hortonworks.com:8020");
FileSystem fs = FileSystem.get(URI.create(inputF), conf);
inStream = new BufferedInputStream(fs.open(inPath));
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
DataFileStream reader = new DataFileStream(inStream, new GenericDatumReader());
Schema schema = reader.getSchema();
System.out.println(schema.toString());
&lt;/PRE&gt;</description>
      <pubDate>Fri, 05 Feb 2016 22:17:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127966#M18152</guid>
      <dc:creator>lenovomi</dc:creator>
      <dc:date>2016-02-05T22:17:03Z</dc:date>
    </item>
    <item>
      <title>Re: read a AVRO file stored in HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127967#M18153</link>
      <description>&lt;P&gt;you should fix that FORUM website its pain to format text, paste code etc.... &lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2016 22:18:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127967#M18153</guid>
      <dc:creator>lenovomi</dc:creator>
      <dc:date>2016-02-05T22:18:50Z</dc:date>
    </item>
    <item>
      <title>Re: read a AVRO file stored in HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127968#M18154</link>
      <description>&lt;P&gt;look in /etc/hadoop/conf directory &lt;A rel="user" href="https://community.cloudera.com/users/1997/lenovomi.html" nodeid="1997"&gt;@John Smith&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2016 23:12:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127968#M18154</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-05T23:12:48Z</dc:date>
    </item>
    <item>
      <title>Re: read a AVRO file stored in HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127969#M18155</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1997/lenovomi.html" nodeid="1997"&gt;@John Smith&lt;/A&gt; use "code" button to paste code&lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2016 23:13:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127969#M18155</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-05T23:13:59Z</dc:date>
    </item>
    <item>
      <title>Re: read a AVRO file stored in HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127970#M18156</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1997/lenovomi.html" nodeid="1997"&gt;@John Smith&lt;/A&gt; I edited the answer to format the code. &lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2016 23:15:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/read-a-AVRO-file-stored-in-HDFS/m-p/127970#M18156</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-05T23:15:17Z</dc:date>
    </item>
  </channel>
</rss>

