Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Problem reading AVRO file in HIVE after Sqoop import: IOException: Not a data file

Problem reading AVRO file in HIVE after Sqoop import: IOException: Not a data file

Rising Star

After importing tables from database via sqoop in AVRO file format the HIVE cannot read the external table.

It gives only this error message:

Failed with exception java.io.IOException:java.io.IOException: Not a data file

 

These are the steps:

1. I have imported via sqoop -as avrofile a query result set into HDFS

2. After the import I downloaded one file to extrakt the avro schema. Via avro tools I extract the schema and upload to the HDFS directory.

java -jar ./avro-tools-1.7.7.jar getschema ./part-m-00000.avro > mytable.avsc

3. The directory permission and files permission are set to read for everybody.

4. After that I create an external table like this:

 

CREATE EXTERNAL TABLE mytable
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/user/hdfs/mytable'
TBLPROPERTIES ('avro.schema.url'='hdfs:///user/hdfs/mytable/mytable.avsc');

 

5. Hive creates the table, but I cannot select it.

6. BUT: in Impala after metadata refresh I can query the table without any problem.

 

So I assume the problem is in Hive, 

please help,

Thnks

Tomas.