- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
read a AVRO file stored in HDFS
- Labels:
-
Apache Hadoop
Created ‎02-05-2016 11:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I want to read a metadata from avro file stored in HDFS using AVRO api ( https://avro.apache.org/docs/1.4.1/api/java/org/apache/avro/file/DataFileReader.html )
The avro DataFileReader accepts only File objects. Is it somehow possible to read data from file stored on hdfs instead of data stored on local fs?
Thank you
Created ‎02-05-2016 02:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I created sample code, it works FINE.
BufferedInputStream inStream = null; String inputF = "hdfs://CustomerData-20160128-1501807.avro"; org.apache.hadoop.fs.Path inPath = new org.apache.hadoop.fs.Path(inputF); try { Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://sandbox.hortonworks.com:8020"); FileSystem fs = FileSystem.get(URI.create(inputF), conf); inStream = new BufferedInputStream(fs.open(inPath)); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } DataFileStream reader = new DataFileStream(inStream, new GenericDatumReader()); Schema schema = reader.getSchema(); System.out.println(schema.toString());
Created ‎02-05-2016 12:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@John Smithcan you clarify, are you trying to do this programmatically using Java or in a pig script? You can look up schema using avro tools and pass getschema flag Link. I once kept schema in hdfs as XML but it can be any format even json ouut of avro tools and then process new records. Maybe what you suggest is better, to get schema. You can probably try reading it and passing hdfs scheme rather than file:///
Created ‎02-05-2016 01:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
im trying to do this as part of the JAVA programme.
Created ‎02-05-2016 01:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
can you call
avro-tools-1.7.4.jar
within the pig script? and also is it possible to access files stored on HDFS using avro-tools?
Created ‎02-05-2016 01:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@John Smith those are all valid questions :), I haven't tried as there was never a need. Try it out, post an article! As far as accessing from Pig, not sure that's possible? Again, try it out. You might be able to look at source code and write a UDF that does what avro-tools tries to do, I don't know. By the way, avro-tools coincides with the version of avro, so I'd suggest downloading the latest avro-tools available, which at this moment is 1.8.0.
Created ‎02-05-2016 01:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@John Smith then look at how to infer schema in Java API. You don't need avro-tools in that case.
Created ‎02-05-2016 02:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
im trying to write sample java code... but
https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/conf/Configuration.html
[root@sandbox deploy-4]# find / -name core-default.xml
[root@sandbox deploy-4]# find / -name core-site..xml
there are no such a files in sandbox. How can i go thru this step?
thanks
Created ‎02-05-2016 03:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
look in /etc/hadoop/conf directory @John Smith
Created ‎02-05-2016 02:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I created sample code, it works FINE.
BufferedInputStream inStream = null; String inputF = "hdfs://CustomerData-20160128-1501807.avro"; org.apache.hadoop.fs.Path inPath = new org.apache.hadoop.fs.Path(inputF); try { Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://sandbox.hortonworks.com:8020"); FileSystem fs = FileSystem.get(URI.create(inputF), conf); inStream = new BufferedInputStream(fs.open(inPath)); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } DataFileStream reader = new DataFileStream(inStream, new GenericDatumReader()); Schema schema = reader.getSchema(); System.out.println(schema.toString());
Created ‎02-05-2016 02:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you should fix that FORUM website its pain to format text, paste code etc....
