Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

read a AVRO file stored in HDFS

Solved Go to solution

read a AVRO file stored in HDFS

Expert Contributor

Hi,

I want to read a metadata from avro file stored in HDFS using AVRO api ( https://avro.apache.org/docs/1.4.1/api/java/org/apache/avro/file/DataFileReader.html )

The avro DataFileReader accepts only File objects. Is it somehow possible to read data from file stored on hdfs instead of data stored on local fs?

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions

Re: read a AVRO file stored in HDFS

Expert Contributor

I created sample code, it works FINE.

BufferedInputStream inStream = null;
String inputF = "hdfs://CustomerData-20160128-1501807.avro";
org.apache.hadoop.fs.Path inPath = new org.apache.hadoop.fs.Path(inputF);
try {
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://sandbox.hortonworks.com:8020");
FileSystem fs = FileSystem.get(URI.create(inputF), conf);
inStream = new BufferedInputStream(fs.open(inPath));
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
DataFileStream reader = new DataFileStream(inStream, new GenericDatumReader());
Schema schema = reader.getSchema();
System.out.println(schema.toString());
11 REPLIES 11

Re: read a AVRO file stored in HDFS

Mentor

@John Smithcan you clarify, are you trying to do this programmatically using Java or in a pig script? You can look up schema using avro tools and pass getschema flag Link. I once kept schema in hdfs as XML but it can be any format even json ouut of avro tools and then process new records. Maybe what you suggest is better, to get schema. You can probably try reading it and passing hdfs scheme rather than file:///

Re: read a AVRO file stored in HDFS

Expert Contributor

Hi,

im trying to do this as part of the JAVA programme.

Re: read a AVRO file stored in HDFS

Expert Contributor

can you call

avro-tools-1.7.4.jar 

within the pig script? and also is it possible to access files stored on HDFS using avro-tools?

Re: read a AVRO file stored in HDFS

Mentor

@John Smith those are all valid questions :), I haven't tried as there was never a need. Try it out, post an article! As far as accessing from Pig, not sure that's possible? Again, try it out. You might be able to look at source code and write a UDF that does what avro-tools tries to do, I don't know. By the way, avro-tools coincides with the version of avro, so I'd suggest downloading the latest avro-tools available, which at this moment is 1.8.0.

Re: read a AVRO file stored in HDFS

Mentor

@John Smith then look at how to infer schema in Java API. You don't need avro-tools in that case.

Re: read a AVRO file stored in HDFS

Expert Contributor

im trying to write sample java code... but

https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/conf/Configuration.html

[root@sandbox deploy-4]# find / -name core-default.xml

[root@sandbox deploy-4]# find / -name core-site..xml

there are no such a files in sandbox. How can i go thru this step?

thanks

Re: read a AVRO file stored in HDFS

Mentor

look in /etc/hadoop/conf directory @John Smith

Re: read a AVRO file stored in HDFS

Expert Contributor

I created sample code, it works FINE.

BufferedInputStream inStream = null;
String inputF = "hdfs://CustomerData-20160128-1501807.avro";
org.apache.hadoop.fs.Path inPath = new org.apache.hadoop.fs.Path(inputF);
try {
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://sandbox.hortonworks.com:8020");
FileSystem fs = FileSystem.get(URI.create(inputF), conf);
inStream = new BufferedInputStream(fs.open(inPath));
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
DataFileStream reader = new DataFileStream(inStream, new GenericDatumReader());
Schema schema = reader.getSchema();
System.out.println(schema.toString());
Highlighted

Re: read a AVRO file stored in HDFS

Expert Contributor

you should fix that FORUM website its pain to format text, paste code etc....

Don't have an account?
Coming from Hortonworks? Activate your account here