Created on 05-06-2016 11:50 PM - edited 09-16-2022 03:17 AM
Hi,I have very simple AVSC file, And I generated the Avro using GitHub Code, xml to Avro converter,
https://github.com/elodina/xml-avro/tree/master/src/ly/stealth/xmlavro/Converter.java
But when I query the table I get below error.
Avro - java.io.IOException: java.io.IOException: Not a data file.
I can see the avro file inside the table folder. Kindly let me know what I am missing??
Below is my AVSC-
CREATE TABLE embedded COMMENT "just drop the schema right into the HQL" ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'hdfs://csaa-aap-qa/apps/hive/warehouse/reservemodel.db/embedded' TBLPROPERTIES ( 'avro.schema.literal'='{ "fields": [ { "name": "BillDate", "source": "element BillDate", "type": "string" }, { "name": "BillTime", "source": "element BillTime", "type": "string" }, { "name": "Remit_CompanyName", "source": "element Remit_CompanyName", "type": "string" }, { "name": "Remit_Addr", "source": "element Remit_Addr", "type": "string" }, { "name": "Remit_CityStZip", "source": "element Remit_CityStZip", "type": "string" }, { "name": "Remit_Phone", "source": "element Remit_Phone", "type": "string" }, { "name": "Remit_Fax", "source": "element Remit_Fax", "type": "string" }, { "name": "Remit_TaxID", "source": "element Remit_TaxID", "type": "string" }, { "name": "Previous_Balance", "source": "element Previous_Balance", "type": "string" }, { "name": "others", "type": { "type": "map", "values": "string" } } ], "name": "MetroBillType", "namespace": "ly.stealth.xmlavro", "protocol": "xml", "type": "record" } ') ;
And the XML is.
<?xml version="1.0" encoding="UTF-8" ?> <MetroBill xmlns:xs="http://www.w3.org/2001/XMLSchema-instance" xs:noNamespaceSchemaLocation="Metrobill.xsd" > <BillDate>02/29/2016</BillDate> <BillTime>18:49:05</BillTime> <Remit_CompanyName>METROPOLITAN REPORTING BUREAU</Remit_CompanyName> <Remit_Addr>P.O. BOX 926, WILLIAM PENN ANNEX</Remit_Addr> <Remit_CityStZip>PHILADELPHIA, PA 19105-0926</Remit_CityStZip> <Remit_Phone>(800) 245-6686</Remit_Phone> <Remit_Fax>(800) 343-9047</Remit_Fax> <Remit_TaxID>23-1879730</Remit_TaxID> <Previous_Balance>1663</Previous_Balance> </MetroBill>
The Avro genrated is with special chars-
02/29/201618:49:05:METROPOLITAN REPORTING BUREAU@P.O. BOX 926, WILLIAM PENN ANNEX6PHILADELPHIA, PA 19105-0926(800) 245-6686(800) 343-904723-18797301663
Regards
Mamta Chawla
Created 05-10-2016 08:00 AM
I am novice to data loading and Avro so I can't verify the avro schema etc. provided in the post but I did face the same issue (java.io.IOException: java.io.IOException: Not a data file). I am listing below the steps I carried out to load a sql server table to avro and then to a Hive external table, maybe, it will provide some pointers :
--create dir. for sqoop import
[hdfs@l1031lab root]$ hadoop fs -mkdir -p /dataload/tohdfs/reio/odpdw/may2016/ [hdfs@l1031lab root]$
--grant permissions to sqoop
[hdfs@l1031lab root]$ hadoop fs -chown -R sqoop /dataload [hdfs@l1031lab root]$ [hdfs@l1031lab root]$ hadoop fs -ls /dataload/tohdfs/reio/odpdw/may2016/ [hdfs@l1031lab root]$
--sqoop import
sqoop import --connect 'jdbc:sqlserver://dbserver;database=dbname' --username someusername --password somepassword --as-avrodatafile --num-mappers 8 --table DimSampleDesc --warehouse-dir /dataload/tohdfs/reio/odpdw/may2016 --verbose
An exception was thrown but the avro files were created :
Writing Avro schema file: /tmp/sqoop-sqoop/compile/e64596608ce0247bf2233353991b20fd/DimSampleDesc.avsc 16/05/09 13:09:00 DEBUG mapreduce.DataDrivenImportJob: Could not move Avro schema file to code output directory. java.io.FileNotFoundException: Destination directory '.' does not exist [createDestDir=true] at org.apache.commons.io.FileUtils.moveFileToDirectory(FileUtils.java:2865) at org.apache.sqoop.mapreduce.DataDrivenImportJob.writeAvroSchema(DataDrivenImportJob.java:146) at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:92) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673) at org.apache.sqoop.manager.SQLServerManager.importTable(SQLServerManager.java:163) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:148) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235) at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
error in moving the schema file, hence, do manually. If this step is skipped, the table will still get created but when you do a 'select' on the table, you get the 'Not a data file' error message.
[sqoop@l1038lab root]$ hadoop fs -copyFromLocal /tmp/sqoop-sqoop/compile/e64596608ce0247bf2233353991b20fd/* /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/
to allow hive to write, give write permissions to all
[sqoop@l1038lab root]$ hadoop fs -chmod -R a+w /dataload/ [sqoop@l1038lab root]$
view the warehouse-dir
[sqoop@l1038lab root]$ hadoop fs -ls /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc Found 15 items -rw-rw-rw- 3 sqoop hdfs 3659 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc -rw-rw-rw- 3 sqoop hdfs 28540 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.class -rw-rw-rw- 3 sqoop hdfs 9568 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.jar -rw-rw-rw- 3 sqoop hdfs 61005 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.java -rw-rw-rw- 3 sqoop hdfs 0 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/_SUCCESS -rw-rw-rw- 3 sqoop hdfs 2660 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00000.avro -rw-rw-rw- 3 sqoop hdfs 1465072 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00001.avro -rw-rw-rw- 3 sqoop hdfs 3577459 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00002.avro -rw-rw-rw- 3 sqoop hdfs 1203520 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00003.avro -rw-rw-rw- 3 sqoop hdfs 236282 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00004.avro -rw-rw-rw- 3 sqoop hdfs 638532 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00005.avro -rw-rw-rw- 3 sqoop hdfs 850454 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00006.avro -rw-rw-rw- 3 sqoop hdfs 257025 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00007.avro -rw-rw-rw- 3 sqoop hdfs 341204 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00008.avro -rw-rw-rw- 3 sqoop hdfs 4792 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00009.avro [sqoop@l1038lab root]$
view the schema file
[sqoop@l1038lab root]$ hadoop fs -cat /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc { "type" : "record", "name" : "DimSampleDesc", "doc" : "Sqoop import of DimSampleDesc", "fields" : [ { "name" : "SmapiName_ver", "type" : [ "null", "string" ], "default" : null, "columnName" : "SmapiName_ver", "sqlType" : "12" }, { "name" : "SmapiColName", "type" : [ "null", "string" ], "default" : null, "columnName" : "SmapiColName", "sqlType" : "12" }, { "name" : "ChartType", "type" : [ "null", "int" ], "default" : null, "columnName" : "ChartType", "sqlType" : "4" }, { "name" : "X_Indexet", "type" : [ "null", "int" ], "default" : null, "columnName" : "X_Indexet", "sqlType" : "4" }, { "name" : "Y_Indexet", "type" : [ "null", "int" ], "default" : null, "columnName" : "Y_Indexet", "sqlType" : "4" }, { "name" : "X_Tick", "type" : [ "null", "string" ], "default" : null, "columnName" : "X_Tick", "sqlType" : "-9" }, { "name" : "Y_Tick", "type" : [ "null", "string" ], "default" : null, "columnName" : "Y_Tick", "sqlType" : "-9" }, { "name" : "X_TickRange", "type" : [ "null", "string" ], "default" : null, "columnName" : "X_TickRange", "sqlType" : "-9" }, { "name" : "X_TickRangeFrom", "type" : [ "null", "string" ], "default" : null, "columnName" : "X_TickRangeFrom", "sqlType" : "-9" }, { "name" : "X_TickRangeTom", "type" : [ "null", "string" ], "default" : null, "columnName" : "X_TickRangeTom", "sqlType" : "-9" }, { "name" : "Y_TickRange", "type" : [ "null", "string" ], "default" : null, "columnName" : "Y_TickRange", "sqlType" : "-9" }, { "name" : "Y_TickRangeFrom", "type" : [ "null", "string" ], "default" : null, "columnName" : "Y_TickRangeFrom", "sqlType" : "-9" }, { "name" : "Y_TickRangeTom", "type" : [ "null", "string" ], "default" : null, "columnName" : "Y_TickRangeTom", "sqlType" : "-9" }, { "name" : "IndexCount", "type" : [ "null", "int" ], "default" : null, "columnName" : "IndexCount", "sqlType" : "4" }, { "name" : "X_IndexCount", "type" : [ "null", "int" ], "default" : null, "columnName" : "X_IndexCount", "sqlType" : "4" }, { "name" : "Y_IndexCount", "type" : [ "null", "int" ], "default" : null, "columnName" : "Y_IndexCount", "sqlType" : "4" }, { "name" : "X_Symbol", "type" : [ "null", "string" ], "default" : null, "columnName" : "X_Symbol", "sqlType" : "-9" }, { "name" : "X_SymbolName", "type" : [ "null", "string" ], "default" : null, "columnName" : "X_SymbolName", "sqlType" : "-9" }, { "name" : "X_SymbolDescr", "type" : [ "null", "string" ], "default" : null, "columnName" : "X_SymbolDescr", "sqlType" : "-9" }, { "name" : "Y_Symbol", "type" : [ "null", "string" ], "default" : null, "columnName" : "Y_Symbol", "sqlType" : "-9" }, { "name" : "Y_SymbolName", "type" : [ "null", "string" ], "default" : null, "columnName" : "Y_SymbolName", "sqlType" : "-9" }, { "name" : "Y_SymbolDescr", "type" : [ "null", "string" ], "default" : null, "columnName" : "Y_SymbolDescr", "sqlType" : "-9" }, { "name" : "SmapiName", "type" : [ "null", "string" ], "default" : null, "columnName" : "SmapiName", "sqlType" : "12" }, { "name" : "Incorrect_Ver_FL", "type" : [ "null", "boolean" ], "default" : null, "columnName" : "Incorrect_Ver_FL", "sqlType" : "-7" } ], "tableName" : "DimSampleDesc" }[sqoop@l1038lab root]$
create the hive table in the schema i.e first do a 'use database' on the hive prompt and then create the table :
hive> > CREATE EXTERNAL TABLE DimSampleDesc ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='hdfs://l1031lab.sss.se.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc'); OK Time taken: 0.37 seconds hive>
--describe the table
hive> > > describe formatted DimSampleDesc; OK # col_name data_type comment smapiname_ver string smapicolname string charttype int x_indexet int y_indexet int x_tick string y_tick string x_tickrange string x_tickrangefrom string x_tickrangetom string y_tickrange string y_tickrangefrom string y_tickrangetom string indexcount int x_indexcount int y_indexcount int x_symbol string x_symbolname string x_symboldescr string y_symbol string y_symbolname string y_symboldescr string smapiname string incorrect_ver_fl boolean # Detailed Table Information Database: odp_dw_may2016 Owner: hive CreateTime: Mon May 09 16:37:08 CEST 2016 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://l1031lab.sss.se.com:8020/apps/hive/warehouse/odp_dw_may2016.db/dimsampledesc Table Type: EXTERNAL_TABLE Table Parameters: COLUMN_STATS_ACCURATE false EXTERNAL TRUE avro.schema.url hdfs://l1031lab.sss.se.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc numFiles 6 numRows -1 rawDataSize -1 totalSize 8566342 transient_lastDdlTime 1462804628 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.avro.AvroSerDe InputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format 1 Time taken: 0.431 seconds, Fetched: 56 row(s) hive>
--execute some select queries
hive> > select * from DimSampleDesc limit 5; OK Abnormal_ShutDown_ct_1 Abnormal_ShutDown_ct_1_000 7 0 0 1 1 1 times counts - Abnormal_ShutDown_ct NULL Abnormal_ShutDown_ct_2 Abnormal_ShutDown_ct_2_000 7 0 0 1 1 1 % percent - Abnormal_ShutDown_ct NULL ABS_ActivCOO_ct_1 ABS_ActivCOO_ct_1_000 7 0 0 1 1 1 times counts - ABS_ActivCOO_ct NULL ABS_ActivCOO_ct_2 ABS_ActivCOO_ct_2_000 1 0 0 0 0 0 0 3 3 0 times counts - ABS_ActivCOO_ct NULL ABS_ActivCOO_ct_2 ABS_ActivCOO_ct_2_001 1 1 0 1 1 1 1 3 3 0 times counts - ABS_ActivCOO_ct NULL Time taken: 0.095 seconds, Fetched: 5 row(s) hive>
The core is to have the avsc and avro files in place and ensure proper permissions.
I hope that helps.
Created 05-07-2016 09:31 AM
I'm also having a lot of problems with Avro (see here) and have seen exactly the same problem you are having.
In my case I was sending Avro objects to Kafka and then having Flume transfer them from Kafka to HDFS. If I generated the objects (from Python) using DatumWrite and sent bytes I could not decode them in HDFS or Hive, even if I specified a schema. If I used DataFileWrite (such that the schema was included in the object) and uploaded to HDFS manually, all was fine.
I suspect the problem is with the Avro SerDe. It looks like it is ignoring the supplied schema and always looking in the object for the schema definition.
Created 05-09-2016 10:58 PM
Hi Mark,
Can you please plrovide me a sample how to add Avro schema with avro data??
Thanks
Mamta
Created 05-10-2016 08:00 AM
I am novice to data loading and Avro so I can't verify the avro schema etc. provided in the post but I did face the same issue (java.io.IOException: java.io.IOException: Not a data file). I am listing below the steps I carried out to load a sql server table to avro and then to a Hive external table, maybe, it will provide some pointers :
--create dir. for sqoop import
[hdfs@l1031lab root]$ hadoop fs -mkdir -p /dataload/tohdfs/reio/odpdw/may2016/ [hdfs@l1031lab root]$
--grant permissions to sqoop
[hdfs@l1031lab root]$ hadoop fs -chown -R sqoop /dataload [hdfs@l1031lab root]$ [hdfs@l1031lab root]$ hadoop fs -ls /dataload/tohdfs/reio/odpdw/may2016/ [hdfs@l1031lab root]$
--sqoop import
sqoop import --connect 'jdbc:sqlserver://dbserver;database=dbname' --username someusername --password somepassword --as-avrodatafile --num-mappers 8 --table DimSampleDesc --warehouse-dir /dataload/tohdfs/reio/odpdw/may2016 --verbose
An exception was thrown but the avro files were created :
Writing Avro schema file: /tmp/sqoop-sqoop/compile/e64596608ce0247bf2233353991b20fd/DimSampleDesc.avsc 16/05/09 13:09:00 DEBUG mapreduce.DataDrivenImportJob: Could not move Avro schema file to code output directory. java.io.FileNotFoundException: Destination directory '.' does not exist [createDestDir=true] at org.apache.commons.io.FileUtils.moveFileToDirectory(FileUtils.java:2865) at org.apache.sqoop.mapreduce.DataDrivenImportJob.writeAvroSchema(DataDrivenImportJob.java:146) at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:92) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673) at org.apache.sqoop.manager.SQLServerManager.importTable(SQLServerManager.java:163) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:148) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235) at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
error in moving the schema file, hence, do manually. If this step is skipped, the table will still get created but when you do a 'select' on the table, you get the 'Not a data file' error message.
[sqoop@l1038lab root]$ hadoop fs -copyFromLocal /tmp/sqoop-sqoop/compile/e64596608ce0247bf2233353991b20fd/* /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/
to allow hive to write, give write permissions to all
[sqoop@l1038lab root]$ hadoop fs -chmod -R a+w /dataload/ [sqoop@l1038lab root]$
view the warehouse-dir
[sqoop@l1038lab root]$ hadoop fs -ls /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc Found 15 items -rw-rw-rw- 3 sqoop hdfs 3659 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc -rw-rw-rw- 3 sqoop hdfs 28540 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.class -rw-rw-rw- 3 sqoop hdfs 9568 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.jar -rw-rw-rw- 3 sqoop hdfs 61005 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.java -rw-rw-rw- 3 sqoop hdfs 0 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/_SUCCESS -rw-rw-rw- 3 sqoop hdfs 2660 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00000.avro -rw-rw-rw- 3 sqoop hdfs 1465072 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00001.avro -rw-rw-rw- 3 sqoop hdfs 3577459 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00002.avro -rw-rw-rw- 3 sqoop hdfs 1203520 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00003.avro -rw-rw-rw- 3 sqoop hdfs 236282 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00004.avro -rw-rw-rw- 3 sqoop hdfs 638532 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00005.avro -rw-rw-rw- 3 sqoop hdfs 850454 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00006.avro -rw-rw-rw- 3 sqoop hdfs 257025 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00007.avro -rw-rw-rw- 3 sqoop hdfs 341204 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00008.avro -rw-rw-rw- 3 sqoop hdfs 4792 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00009.avro [sqoop@l1038lab root]$
view the schema file
[sqoop@l1038lab root]$ hadoop fs -cat /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc { "type" : "record", "name" : "DimSampleDesc", "doc" : "Sqoop import of DimSampleDesc", "fields" : [ { "name" : "SmapiName_ver", "type" : [ "null", "string" ], "default" : null, "columnName" : "SmapiName_ver", "sqlType" : "12" }, { "name" : "SmapiColName", "type" : [ "null", "string" ], "default" : null, "columnName" : "SmapiColName", "sqlType" : "12" }, { "name" : "ChartType", "type" : [ "null", "int" ], "default" : null, "columnName" : "ChartType", "sqlType" : "4" }, { "name" : "X_Indexet", "type" : [ "null", "int" ], "default" : null, "columnName" : "X_Indexet", "sqlType" : "4" }, { "name" : "Y_Indexet", "type" : [ "null", "int" ], "default" : null, "columnName" : "Y_Indexet", "sqlType" : "4" }, { "name" : "X_Tick", "type" : [ "null", "string" ], "default" : null, "columnName" : "X_Tick", "sqlType" : "-9" }, { "name" : "Y_Tick", "type" : [ "null", "string" ], "default" : null, "columnName" : "Y_Tick", "sqlType" : "-9" }, { "name" : "X_TickRange", "type" : [ "null", "string" ], "default" : null, "columnName" : "X_TickRange", "sqlType" : "-9" }, { "name" : "X_TickRangeFrom", "type" : [ "null", "string" ], "default" : null, "columnName" : "X_TickRangeFrom", "sqlType" : "-9" }, { "name" : "X_TickRangeTom", "type" : [ "null", "string" ], "default" : null, "columnName" : "X_TickRangeTom", "sqlType" : "-9" }, { "name" : "Y_TickRange", "type" : [ "null", "string" ], "default" : null, "columnName" : "Y_TickRange", "sqlType" : "-9" }, { "name" : "Y_TickRangeFrom", "type" : [ "null", "string" ], "default" : null, "columnName" : "Y_TickRangeFrom", "sqlType" : "-9" }, { "name" : "Y_TickRangeTom", "type" : [ "null", "string" ], "default" : null, "columnName" : "Y_TickRangeTom", "sqlType" : "-9" }, { "name" : "IndexCount", "type" : [ "null", "int" ], "default" : null, "columnName" : "IndexCount", "sqlType" : "4" }, { "name" : "X_IndexCount", "type" : [ "null", "int" ], "default" : null, "columnName" : "X_IndexCount", "sqlType" : "4" }, { "name" : "Y_IndexCount", "type" : [ "null", "int" ], "default" : null, "columnName" : "Y_IndexCount", "sqlType" : "4" }, { "name" : "X_Symbol", "type" : [ "null", "string" ], "default" : null, "columnName" : "X_Symbol", "sqlType" : "-9" }, { "name" : "X_SymbolName", "type" : [ "null", "string" ], "default" : null, "columnName" : "X_SymbolName", "sqlType" : "-9" }, { "name" : "X_SymbolDescr", "type" : [ "null", "string" ], "default" : null, "columnName" : "X_SymbolDescr", "sqlType" : "-9" }, { "name" : "Y_Symbol", "type" : [ "null", "string" ], "default" : null, "columnName" : "Y_Symbol", "sqlType" : "-9" }, { "name" : "Y_SymbolName", "type" : [ "null", "string" ], "default" : null, "columnName" : "Y_SymbolName", "sqlType" : "-9" }, { "name" : "Y_SymbolDescr", "type" : [ "null", "string" ], "default" : null, "columnName" : "Y_SymbolDescr", "sqlType" : "-9" }, { "name" : "SmapiName", "type" : [ "null", "string" ], "default" : null, "columnName" : "SmapiName", "sqlType" : "12" }, { "name" : "Incorrect_Ver_FL", "type" : [ "null", "boolean" ], "default" : null, "columnName" : "Incorrect_Ver_FL", "sqlType" : "-7" } ], "tableName" : "DimSampleDesc" }[sqoop@l1038lab root]$
create the hive table in the schema i.e first do a 'use database' on the hive prompt and then create the table :
hive> > CREATE EXTERNAL TABLE DimSampleDesc ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='hdfs://l1031lab.sss.se.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc'); OK Time taken: 0.37 seconds hive>
--describe the table
hive> > > describe formatted DimSampleDesc; OK # col_name data_type comment smapiname_ver string smapicolname string charttype int x_indexet int y_indexet int x_tick string y_tick string x_tickrange string x_tickrangefrom string x_tickrangetom string y_tickrange string y_tickrangefrom string y_tickrangetom string indexcount int x_indexcount int y_indexcount int x_symbol string x_symbolname string x_symboldescr string y_symbol string y_symbolname string y_symboldescr string smapiname string incorrect_ver_fl boolean # Detailed Table Information Database: odp_dw_may2016 Owner: hive CreateTime: Mon May 09 16:37:08 CEST 2016 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://l1031lab.sss.se.com:8020/apps/hive/warehouse/odp_dw_may2016.db/dimsampledesc Table Type: EXTERNAL_TABLE Table Parameters: COLUMN_STATS_ACCURATE false EXTERNAL TRUE avro.schema.url hdfs://l1031lab.sss.se.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc numFiles 6 numRows -1 rawDataSize -1 totalSize 8566342 transient_lastDdlTime 1462804628 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.avro.AvroSerDe InputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format 1 Time taken: 0.431 seconds, Fetched: 56 row(s) hive>
--execute some select queries
hive> > select * from DimSampleDesc limit 5; OK Abnormal_ShutDown_ct_1 Abnormal_ShutDown_ct_1_000 7 0 0 1 1 1 times counts - Abnormal_ShutDown_ct NULL Abnormal_ShutDown_ct_2 Abnormal_ShutDown_ct_2_000 7 0 0 1 1 1 % percent - Abnormal_ShutDown_ct NULL ABS_ActivCOO_ct_1 ABS_ActivCOO_ct_1_000 7 0 0 1 1 1 times counts - ABS_ActivCOO_ct NULL ABS_ActivCOO_ct_2 ABS_ActivCOO_ct_2_000 1 0 0 0 0 0 0 3 3 0 times counts - ABS_ActivCOO_ct NULL ABS_ActivCOO_ct_2 ABS_ActivCOO_ct_2_001 1 1 0 1 1 1 1 3 3 0 times counts - ABS_ActivCOO_ct NULL Time taken: 0.095 seconds, Fetched: 5 row(s) hive>
The core is to have the avsc and avro files in place and ensure proper permissions.
I hope that helps.
Created 05-11-2016 05:40 AM
Thanks @Kaliyug Antagonist, for the solution it worked.
Created 05-19-2016 02:35 PM
can you let me know how did it work for you? Did you give permissions to avsc and avro files and created hive tables to solve above issue?
Created 02-22-2018 01:08 PM
For me, the issue was due to some data already present at external table hdfs location. Since that data was not in avro format table was not able to read it properly.
I just removed it and the table worked fine.