<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Reading avro file from HDFS using Scala - Exception in thread &amp;quot;main&amp;quot; java.io.IOException: Not a data file. in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Reading-avro-file-from-HDFS-using-Scala-Exception-in-thread/m-p/227267#M75203</link>
    <description>&lt;P&gt;issue is fixed, since this is not an Avro file but just an Avro schema . this needs to read as a text file.&lt;/P&gt;</description>
    <pubDate>Mon, 12 Mar 2018 09:28:14 GMT</pubDate>
    <dc:creator>karan_alang1</dc:creator>
    <dc:date>2018-03-12T09:28:14Z</dc:date>
    <item>
      <title>Reading avro file from HDFS using Scala - Exception in thread "main" java.io.IOException: Not a data file.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Reading-avro-file-from-HDFS-using-Scala-Exception-in-thread/m-p/227266#M75202</link>
      <description>&lt;P&gt;Hello - i'm reading an Avro file from HDFS - and it seems to be giving exception - &lt;/P&gt;&lt;PRE&gt;Exception in thread "main" java.io.IOException: Not a data file.
at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
at org.apache.avro.file.DataFileStream.&amp;lt;init&amp;gt;(DataFileStream.java:84)
at karan.scala.readuri.ReadAvroFromURI$.readAvroFromURI(ReadAvroFromURI.scala:52)
at karan.scala.readuri.ReadAvroFromURI$.delayedEndpoint$karan$scala$readuri$ReadAvroFromURI$1(ReadAvroFromURI.scala:29)
at karan.scala.readuri.ReadAvroFromURI$delayedInit$body.apply(ReadAvroFromURI.scala:24)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$anonfun$main$1.apply(App.scala:76)
at scala.App$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at karan.scala.readuri.ReadAvroFromURI$.main(ReadAvroFromURI.scala:24)
at karan.scala.readuri.ReadAvroFromURI.main(ReadAvroFromURI.scala)&lt;/PRE&gt;&lt;P&gt;Here is the Code :&lt;/P&gt;&lt;PRE&gt;conf.set("fs.defaultFS", "hdfs://localhost:9000")
val inputF = "hdfs://localhost:9000/avro/emp.avsc"
val inPath = new Path(inputF)
val fs = FileSystem.get(URI.create(inputF), conf)
val inStream = new BufferedInputStream(fs.open(inPath))
val reader = new DataFileStream(inStream, new GenericDatumReader())&lt;/PRE&gt;&lt;P&gt;the DataFileStream.java seems to be looking for magic bytes, to determine if this is Avro, and it is not finding this, and throwing error&lt;/P&gt;&lt;PRE&gt;void initialize(InputStream in) throws IOException {  this.header = new Header();  this.vin = DecoderFactory.get().binaryDecoder(in, vin);  byte[] magic = new byte[DataFileConstants.MAGIC.length];  try {    vin.readFixed(magic);                         // read magic} catch (IOException e) {    throw new IOException("Not a data file.");}  if (!Arrays.equals(DataFileConstants.MAGIC, magic))    throw new IOException("Not a data file.");&lt;/PRE&gt;&lt;P&gt;Any ideas on how to fix this ?&lt;/P&gt;&lt;P&gt;The file is fine, and i'm able to do a cat of the file (shown below) :&lt;/P&gt;&lt;PRE&gt;hdfs dfs -cat hdfs://localhost:9000/avro/emp.avsc

18/03/01 15:30:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

{"namespace": "tutorialspoint.com","type": "record","name": "emp","fields": [{"name": "name", "type": "string"},{"name": "id", "type": "int"},{"name": "salary", "type": "int"},{"name": "age", "type": "int"},{"name": "address", "type": "string"}]}&lt;/PRE&gt;</description>
      <pubDate>Fri, 16 Sep 2022 12:55:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Reading-avro-file-from-HDFS-using-Scala-Exception-in-thread/m-p/227266#M75202</guid>
      <dc:creator>karan_alang1</dc:creator>
      <dc:date>2022-09-16T12:55:26Z</dc:date>
    </item>
    <item>
      <title>Re: Reading avro file from HDFS using Scala - Exception in thread "main" java.io.IOException: Not a data file.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Reading-avro-file-from-HDFS-using-Scala-Exception-in-thread/m-p/227267#M75203</link>
      <description>&lt;P&gt;issue is fixed, since this is not an Avro file but just an Avro schema . this needs to read as a text file.&lt;/P&gt;</description>
      <pubDate>Mon, 12 Mar 2018 09:28:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Reading-avro-file-from-HDFS-using-Scala-Exception-in-thread/m-p/227267#M75203</guid>
      <dc:creator>karan_alang1</dc:creator>
      <dc:date>2018-03-12T09:28:14Z</dc:date>
    </item>
  </channel>
</rss>

