Created on 05-06-2016 11:50 PM - edited 09-16-2022 03:17 AM
Hi,I have very simple AVSC file, And I generated the Avro using GitHub Code, xml to Avro converter,
https://github.com/elodina/xml-avro/tree/master/src/ly/stealth/xmlavro/Converter.java
But when I query the table I get below error.
Avro - java.io.IOException: java.io.IOException: Not a data file.
I can see the avro file inside the table folder. Kindly let me know what I am missing??
Below is my AVSC-
CREATE TABLE embedded COMMENT "just drop the schema right into the HQL" ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'hdfs://csaa-aap-qa/apps/hive/warehouse/reservemodel.db/embedded' TBLPROPERTIES ( 'avro.schema.literal'='{ "fields": [ { "name": "BillDate", "source": "element BillDate", "type": "string" }, { "name": "BillTime", "source": "element BillTime", "type": "string" }, { "name": "Remit_CompanyName", "source": "element Remit_CompanyName", "type": "string" }, { "name": "Remit_Addr", "source": "element Remit_Addr", "type": "string" }, { "name": "Remit_CityStZip", "source": "element Remit_CityStZip", "type": "string" }, { "name": "Remit_Phone", "source": "element Remit_Phone", "type": "string" }, { "name": "Remit_Fax", "source": "element Remit_Fax", "type": "string" }, { "name": "Remit_TaxID", "source": "element Remit_TaxID", "type": "string" }, { "name": "Previous_Balance", "source": "element Previous_Balance", "type": "string" }, { "name": "others", "type": { "type": "map", "values": "string" } } ], "name": "MetroBillType", "namespace": "ly.stealth.xmlavro", "protocol": "xml", "type": "record" } ') ;
And the XML is.
<?xml version="1.0" encoding="UTF-8" ?> <MetroBill xmlns:xs="http://www.w3.org/2001/XMLSchema-instance" xs:noNamespaceSchemaLocation="Metrobill.xsd" > <BillDate>02/29/2016</BillDate> <BillTime>18:49:05</BillTime> <Remit_CompanyName>METROPOLITAN REPORTING BUREAU</Remit_CompanyName> <Remit_Addr>P.O. BOX 926, WILLIAM PENN ANNEX</Remit_Addr> <Remit_CityStZip>PHILADELPHIA, PA 19105-0926</Remit_CityStZip> <Remit_Phone>(800) 245-6686</Remit_Phone> <Remit_Fax>(800) 343-9047</Remit_Fax> <Remit_TaxID>23-1879730</Remit_TaxID> <Previous_Balance>1663</Previous_Balance> </MetroBill>
The Avro genrated is with special chars-
02/29/201618:49:05:METROPOLITAN REPORTING BUREAU@P.O. BOX 926, WILLIAM PENN ANNEX6PHILADELPHIA, PA 19105-0926(800) 245-6686(800) 343-904723-18797301663
Regards
Mamta Chawla
Created 05-10-2016 08:00 AM
I am novice to data loading and Avro so I can't verify the avro schema etc. provided in the post but I did face the same issue (java.io.IOException: java.io.IOException: Not a data file). I am listing below the steps I carried out to load a sql server table to avro and then to a Hive external table, maybe, it will provide some pointers :
--create dir. for sqoop import
[hdfs@l1031lab root]$ hadoop fs -mkdir -p /dataload/tohdfs/reio/odpdw/may2016/ [hdfs@l1031lab root]$
--grant permissions to sqoop
[hdfs@l1031lab root]$ hadoop fs -chown -R sqoop /dataload [hdfs@l1031lab root]$ [hdfs@l1031lab root]$ hadoop fs -ls /dataload/tohdfs/reio/odpdw/may2016/ [hdfs@l1031lab root]$
--sqoop import
sqoop import --connect 'jdbc:sqlserver://dbserver;database=dbname' --username someusername --password somepassword --as-avrodatafile --num-mappers 8 --table DimSampleDesc --warehouse-dir /dataload/tohdfs/reio/odpdw/may2016 --verbose
An exception was thrown but the avro files were created :
Writing Avro schema file: /tmp/sqoop-sqoop/compile/e64596608ce0247bf2233353991b20fd/DimSampleDesc.avsc
16/05/09 13:09:00 DEBUG mapreduce.DataDrivenImportJob: Could not move Avro schema file to code output directory.
java.io.FileNotFoundException: Destination directory '.' does not exist [createDestDir=true]
at org.apache.commons.io.FileUtils.moveFileToDirectory(FileUtils.java:2865)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.writeAvroSchema(DataDrivenImportJob.java:146)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:92)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at org.apache.sqoop.manager.SQLServerManager.importTable(SQLServerManager.java:163)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
at org.apache.sqoop.Sqoop.main(Sqoop.java:244)error in moving the schema file, hence, do manually. If this step is skipped, the table will still get created but when you do a 'select' on the table, you get the 'Not a data file' error message.
[sqoop@l1038lab root]$ hadoop fs -copyFromLocal /tmp/sqoop-sqoop/compile/e64596608ce0247bf2233353991b20fd/* /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/
to allow hive to write, give write permissions to all
[sqoop@l1038lab root]$ hadoop fs -chmod -R a+w /dataload/ [sqoop@l1038lab root]$
view the warehouse-dir
[sqoop@l1038lab root]$ hadoop fs -ls /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc Found 15 items -rw-rw-rw- 3 sqoop hdfs 3659 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc -rw-rw-rw- 3 sqoop hdfs 28540 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.class -rw-rw-rw- 3 sqoop hdfs 9568 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.jar -rw-rw-rw- 3 sqoop hdfs 61005 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.java -rw-rw-rw- 3 sqoop hdfs 0 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/_SUCCESS -rw-rw-rw- 3 sqoop hdfs 2660 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00000.avro -rw-rw-rw- 3 sqoop hdfs 1465072 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00001.avro -rw-rw-rw- 3 sqoop hdfs 3577459 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00002.avro -rw-rw-rw- 3 sqoop hdfs 1203520 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00003.avro -rw-rw-rw- 3 sqoop hdfs 236282 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00004.avro -rw-rw-rw- 3 sqoop hdfs 638532 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00005.avro -rw-rw-rw- 3 sqoop hdfs 850454 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00006.avro -rw-rw-rw- 3 sqoop hdfs 257025 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00007.avro -rw-rw-rw- 3 sqoop hdfs 341204 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00008.avro -rw-rw-rw- 3 sqoop hdfs 4792 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00009.avro [sqoop@l1038lab root]$
view the schema file
[sqoop@l1038lab root]$ hadoop fs -cat /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc
{
"type" : "record",
"name" : "DimSampleDesc",
"doc" : "Sqoop import of DimSampleDesc",
"fields" : [ {
"name" : "SmapiName_ver",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "SmapiName_ver",
"sqlType" : "12"
}, {
"name" : "SmapiColName",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "SmapiColName",
"sqlType" : "12"
}, {
"name" : "ChartType",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "ChartType",
"sqlType" : "4"
}, {
"name" : "X_Indexet",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "X_Indexet",
"sqlType" : "4"
}, {
"name" : "Y_Indexet",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "Y_Indexet",
"sqlType" : "4"
}, {
"name" : "X_Tick",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_Tick",
"sqlType" : "-9"
}, {
"name" : "Y_Tick",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_Tick",
"sqlType" : "-9"
}, {
"name" : "X_TickRange",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_TickRange",
"sqlType" : "-9"
}, {
"name" : "X_TickRangeFrom",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_TickRangeFrom",
"sqlType" : "-9"
}, {
"name" : "X_TickRangeTom",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_TickRangeTom",
"sqlType" : "-9"
}, {
"name" : "Y_TickRange",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_TickRange",
"sqlType" : "-9"
}, {
"name" : "Y_TickRangeFrom",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_TickRangeFrom",
"sqlType" : "-9"
}, {
"name" : "Y_TickRangeTom",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_TickRangeTom",
"sqlType" : "-9"
}, {
"name" : "IndexCount",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "IndexCount",
"sqlType" : "4"
}, {
"name" : "X_IndexCount",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "X_IndexCount",
"sqlType" : "4"
}, {
"name" : "Y_IndexCount",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "Y_IndexCount",
"sqlType" : "4"
}, {
"name" : "X_Symbol",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_Symbol",
"sqlType" : "-9"
}, {
"name" : "X_SymbolName",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_SymbolName",
"sqlType" : "-9"
}, {
"name" : "X_SymbolDescr",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_SymbolDescr",
"sqlType" : "-9"
}, {
"name" : "Y_Symbol",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_Symbol",
"sqlType" : "-9"
}, {
"name" : "Y_SymbolName",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_SymbolName",
"sqlType" : "-9"
}, {
"name" : "Y_SymbolDescr",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_SymbolDescr",
"sqlType" : "-9"
}, {
"name" : "SmapiName",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "SmapiName",
"sqlType" : "12"
}, {
"name" : "Incorrect_Ver_FL",
"type" : [ "null", "boolean" ],
"default" : null,
"columnName" : "Incorrect_Ver_FL",
"sqlType" : "-7"
} ],
"tableName" : "DimSampleDesc"
}[sqoop@l1038lab root]$create the hive table in the schema i.e first do a 'use database' on the hive prompt and then create the table :
hive>
> CREATE EXTERNAL TABLE DimSampleDesc ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='hdfs://l1031lab.sss.se.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc');
OK
Time taken: 0.37 seconds
hive>--describe the table
hive>
>
> describe formatted DimSampleDesc;
OK
# col_name data_type comment
smapiname_ver string
smapicolname string
charttype int
x_indexet int
y_indexet int
x_tick string
y_tick string
x_tickrange string
x_tickrangefrom string
x_tickrangetom string
y_tickrange string
y_tickrangefrom string
y_tickrangetom string
indexcount int
x_indexcount int
y_indexcount int
x_symbol string
x_symbolname string
x_symboldescr string
y_symbol string
y_symbolname string
y_symboldescr string
smapiname string
incorrect_ver_fl boolean
# Detailed Table Information
Database: odp_dw_may2016
Owner: hive
CreateTime: Mon May 09 16:37:08 CEST 2016
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://l1031lab.sss.se.com:8020/apps/hive/warehouse/odp_dw_may2016.db/dimsampledesc
Table Type: EXTERNAL_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE false
EXTERNAL TRUE
avro.schema.url hdfs://l1031lab.sss.se.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc
numFiles 6
numRows -1
rawDataSize -1
totalSize 8566342
transient_lastDdlTime 1462804628
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.avro.AvroSerDe
InputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.431 seconds, Fetched: 56 row(s)
hive>--execute some select queries
hive>
> select * from DimSampleDesc limit 5;
OK
Abnormal_ShutDown_ct_1 Abnormal_ShutDown_ct_1_000 7 0 0 1 1 1 times counts - Abnormal_ShutDown_ct NULL
Abnormal_ShutDown_ct_2 Abnormal_ShutDown_ct_2_000 7 0 0 1 1 1 % percent - Abnormal_ShutDown_ct NULL
ABS_ActivCOO_ct_1 ABS_ActivCOO_ct_1_000 7 0 0 1 1 1 times counts - ABS_ActivCOO_ct NULL
ABS_ActivCOO_ct_2 ABS_ActivCOO_ct_2_000 1 0 0 0 0 0 0 3 3 0 times counts - ABS_ActivCOO_ct NULL
ABS_ActivCOO_ct_2 ABS_ActivCOO_ct_2_001 1 1 0 1 1 1 1 3 3 0 times counts - ABS_ActivCOO_ct NULL
Time taken: 0.095 seconds, Fetched: 5 row(s)
hive>The core is to have the avsc and avro files in place and ensure proper permissions.
I hope that helps.
Created 05-07-2016 09:31 AM
I'm also having a lot of problems with Avro (see here) and have seen exactly the same problem you are having.
In my case I was sending Avro objects to Kafka and then having Flume transfer them from Kafka to HDFS. If I generated the objects (from Python) using DatumWrite and sent bytes I could not decode them in HDFS or Hive, even if I specified a schema. If I used DataFileWrite (such that the schema was included in the object) and uploaded to HDFS manually, all was fine.
I suspect the problem is with the Avro SerDe. It looks like it is ignoring the supplied schema and always looking in the object for the schema definition.
Created 05-09-2016 10:58 PM
Hi Mark,
Can you please plrovide me a sample how to add Avro schema with avro data??
Thanks
Mamta
Created 05-10-2016 08:00 AM
I am novice to data loading and Avro so I can't verify the avro schema etc. provided in the post but I did face the same issue (java.io.IOException: java.io.IOException: Not a data file). I am listing below the steps I carried out to load a sql server table to avro and then to a Hive external table, maybe, it will provide some pointers :
--create dir. for sqoop import
[hdfs@l1031lab root]$ hadoop fs -mkdir -p /dataload/tohdfs/reio/odpdw/may2016/ [hdfs@l1031lab root]$
--grant permissions to sqoop
[hdfs@l1031lab root]$ hadoop fs -chown -R sqoop /dataload [hdfs@l1031lab root]$ [hdfs@l1031lab root]$ hadoop fs -ls /dataload/tohdfs/reio/odpdw/may2016/ [hdfs@l1031lab root]$
--sqoop import
sqoop import --connect 'jdbc:sqlserver://dbserver;database=dbname' --username someusername --password somepassword --as-avrodatafile --num-mappers 8 --table DimSampleDesc --warehouse-dir /dataload/tohdfs/reio/odpdw/may2016 --verbose
An exception was thrown but the avro files were created :
Writing Avro schema file: /tmp/sqoop-sqoop/compile/e64596608ce0247bf2233353991b20fd/DimSampleDesc.avsc
16/05/09 13:09:00 DEBUG mapreduce.DataDrivenImportJob: Could not move Avro schema file to code output directory.
java.io.FileNotFoundException: Destination directory '.' does not exist [createDestDir=true]
at org.apache.commons.io.FileUtils.moveFileToDirectory(FileUtils.java:2865)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.writeAvroSchema(DataDrivenImportJob.java:146)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:92)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at org.apache.sqoop.manager.SQLServerManager.importTable(SQLServerManager.java:163)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
at org.apache.sqoop.Sqoop.main(Sqoop.java:244)error in moving the schema file, hence, do manually. If this step is skipped, the table will still get created but when you do a 'select' on the table, you get the 'Not a data file' error message.
[sqoop@l1038lab root]$ hadoop fs -copyFromLocal /tmp/sqoop-sqoop/compile/e64596608ce0247bf2233353991b20fd/* /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/
to allow hive to write, give write permissions to all
[sqoop@l1038lab root]$ hadoop fs -chmod -R a+w /dataload/ [sqoop@l1038lab root]$
view the warehouse-dir
[sqoop@l1038lab root]$ hadoop fs -ls /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc Found 15 items -rw-rw-rw- 3 sqoop hdfs 3659 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc -rw-rw-rw- 3 sqoop hdfs 28540 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.class -rw-rw-rw- 3 sqoop hdfs 9568 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.jar -rw-rw-rw- 3 sqoop hdfs 61005 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.java -rw-rw-rw- 3 sqoop hdfs 0 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/_SUCCESS -rw-rw-rw- 3 sqoop hdfs 2660 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00000.avro -rw-rw-rw- 3 sqoop hdfs 1465072 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00001.avro -rw-rw-rw- 3 sqoop hdfs 3577459 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00002.avro -rw-rw-rw- 3 sqoop hdfs 1203520 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00003.avro -rw-rw-rw- 3 sqoop hdfs 236282 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00004.avro -rw-rw-rw- 3 sqoop hdfs 638532 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00005.avro -rw-rw-rw- 3 sqoop hdfs 850454 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00006.avro -rw-rw-rw- 3 sqoop hdfs 257025 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00007.avro -rw-rw-rw- 3 sqoop hdfs 341204 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00008.avro -rw-rw-rw- 3 sqoop hdfs 4792 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00009.avro [sqoop@l1038lab root]$
view the schema file
[sqoop@l1038lab root]$ hadoop fs -cat /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc
{
"type" : "record",
"name" : "DimSampleDesc",
"doc" : "Sqoop import of DimSampleDesc",
"fields" : [ {
"name" : "SmapiName_ver",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "SmapiName_ver",
"sqlType" : "12"
}, {
"name" : "SmapiColName",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "SmapiColName",
"sqlType" : "12"
}, {
"name" : "ChartType",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "ChartType",
"sqlType" : "4"
}, {
"name" : "X_Indexet",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "X_Indexet",
"sqlType" : "4"
}, {
"name" : "Y_Indexet",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "Y_Indexet",
"sqlType" : "4"
}, {
"name" : "X_Tick",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_Tick",
"sqlType" : "-9"
}, {
"name" : "Y_Tick",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_Tick",
"sqlType" : "-9"
}, {
"name" : "X_TickRange",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_TickRange",
"sqlType" : "-9"
}, {
"name" : "X_TickRangeFrom",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_TickRangeFrom",
"sqlType" : "-9"
}, {
"name" : "X_TickRangeTom",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_TickRangeTom",
"sqlType" : "-9"
}, {
"name" : "Y_TickRange",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_TickRange",
"sqlType" : "-9"
}, {
"name" : "Y_TickRangeFrom",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_TickRangeFrom",
"sqlType" : "-9"
}, {
"name" : "Y_TickRangeTom",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_TickRangeTom",
"sqlType" : "-9"
}, {
"name" : "IndexCount",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "IndexCount",
"sqlType" : "4"
}, {
"name" : "X_IndexCount",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "X_IndexCount",
"sqlType" : "4"
}, {
"name" : "Y_IndexCount",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "Y_IndexCount",
"sqlType" : "4"
}, {
"name" : "X_Symbol",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_Symbol",
"sqlType" : "-9"
}, {
"name" : "X_SymbolName",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_SymbolName",
"sqlType" : "-9"
}, {
"name" : "X_SymbolDescr",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_SymbolDescr",
"sqlType" : "-9"
}, {
"name" : "Y_Symbol",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_Symbol",
"sqlType" : "-9"
}, {
"name" : "Y_SymbolName",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_SymbolName",
"sqlType" : "-9"
}, {
"name" : "Y_SymbolDescr",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_SymbolDescr",
"sqlType" : "-9"
}, {
"name" : "SmapiName",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "SmapiName",
"sqlType" : "12"
}, {
"name" : "Incorrect_Ver_FL",
"type" : [ "null", "boolean" ],
"default" : null,
"columnName" : "Incorrect_Ver_FL",
"sqlType" : "-7"
} ],
"tableName" : "DimSampleDesc"
}[sqoop@l1038lab root]$create the hive table in the schema i.e first do a 'use database' on the hive prompt and then create the table :
hive>
> CREATE EXTERNAL TABLE DimSampleDesc ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='hdfs://l1031lab.sss.se.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc');
OK
Time taken: 0.37 seconds
hive>--describe the table
hive>
>
> describe formatted DimSampleDesc;
OK
# col_name data_type comment
smapiname_ver string
smapicolname string
charttype int
x_indexet int
y_indexet int
x_tick string
y_tick string
x_tickrange string
x_tickrangefrom string
x_tickrangetom string
y_tickrange string
y_tickrangefrom string
y_tickrangetom string
indexcount int
x_indexcount int
y_indexcount int
x_symbol string
x_symbolname string
x_symboldescr string
y_symbol string
y_symbolname string
y_symboldescr string
smapiname string
incorrect_ver_fl boolean
# Detailed Table Information
Database: odp_dw_may2016
Owner: hive
CreateTime: Mon May 09 16:37:08 CEST 2016
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://l1031lab.sss.se.com:8020/apps/hive/warehouse/odp_dw_may2016.db/dimsampledesc
Table Type: EXTERNAL_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE false
EXTERNAL TRUE
avro.schema.url hdfs://l1031lab.sss.se.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc
numFiles 6
numRows -1
rawDataSize -1
totalSize 8566342
transient_lastDdlTime 1462804628
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.avro.AvroSerDe
InputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.431 seconds, Fetched: 56 row(s)
hive>--execute some select queries
hive>
> select * from DimSampleDesc limit 5;
OK
Abnormal_ShutDown_ct_1 Abnormal_ShutDown_ct_1_000 7 0 0 1 1 1 times counts - Abnormal_ShutDown_ct NULL
Abnormal_ShutDown_ct_2 Abnormal_ShutDown_ct_2_000 7 0 0 1 1 1 % percent - Abnormal_ShutDown_ct NULL
ABS_ActivCOO_ct_1 ABS_ActivCOO_ct_1_000 7 0 0 1 1 1 times counts - ABS_ActivCOO_ct NULL
ABS_ActivCOO_ct_2 ABS_ActivCOO_ct_2_000 1 0 0 0 0 0 0 3 3 0 times counts - ABS_ActivCOO_ct NULL
ABS_ActivCOO_ct_2 ABS_ActivCOO_ct_2_001 1 1 0 1 1 1 1 3 3 0 times counts - ABS_ActivCOO_ct NULL
Time taken: 0.095 seconds, Fetched: 5 row(s)
hive>The core is to have the avsc and avro files in place and ensure proper permissions.
I hope that helps.
Created 05-11-2016 05:40 AM
Thanks @Kaliyug Antagonist, for the solution it worked.
Created 05-19-2016 02:35 PM
can you let me know how did it work for you? Did you give permissions to avsc and avro files and created hive tables to solve above issue?
Created 02-22-2018 01:08 PM
For me, the issue was due to some data already present at external table hdfs location. Since that data was not in avro format table was not able to read it properly.
I just removed it and the table worked fine.