Member since
06-08-2017
27
Posts
9
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5284 | 04-18-2016 11:50 AM |
07-11-2016
01:26 PM
2 Kudos
Currently looking at implementing an Avro Schema Registry that we can integate with Kafka. Any recommendations / implementations anyone can point to?
... View more
Labels:
- Labels:
-
Apache Kafka
-
Schema Registry
05-13-2016
02:54 PM
Hi @Predrag Minovic Thanks for taking the time to look in to this. I had sort of come to the same conclusion but all the info I had seen online seemed to suggest that Hive could access a schema-less Avro object provided that the schema was included via the TBLPROPERTIES avro.schema.url parameter.
... View more
05-10-2016
03:16 PM
Associated Hive code. Avro files are stored in /user/hue/testdata/avro_data/avro.avro etc DROP TABLE IF EXISTS avro_test;
CREATE EXTERNAL TABLE avro_test
COMMENT "A table backed by Avro data with the Avro schema stored in HDFS"
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'hdfs:///user/hue/testdata/avro_data'
TBLPROPERTIES (
'avro.schema.url'='hdfs:///user/hue/testdata/avro.avsc'
);
... View more
05-10-2016
03:12 PM
@Predrag Minovic avro and associated schema. avrobug.zip
... View more
05-10-2016
02:54 PM
The issue I am having is that I have avro objects that don't have the schema in the header. When I try and access these objects by specifying the schema via the schema.url parameter in the TBLPROPERTIES I am unable to access the data. However, if the Avro object includes the schema, I have no problem. I'm pretty sure that the Avro objects are ok as I can extract the data from them using Avro-tools and providing the same schema. So what I would be interesting in seeing is if someone can load a schema-less avro object into HDFS then get a Hive table to access it, by providing the schema file.
... View more
05-10-2016
02:37 PM
@Predrag Minovic So I've been battling with Avro and Hive for days and am getting nowhere. See thread here and here. If you can shed any light, it would be greatly appreciated.
... View more
05-07-2016
09:31 AM
I'm also having a lot of problems with Avro (see here) and have seen exactly the same problem you are having. In my case I was sending Avro objects to Kafka and then having Flume transfer them from Kafka to HDFS. If I generated the objects (from Python) using DatumWrite and sent bytes I could not decode them in HDFS or Hive, even if I specified a schema. If I used DataFileWrite (such that the schema was included in the object) and uploaded to HDFS manually, all was fine. I suspect the problem is with the Avro SerDe. It looks like it is ignoring the supplied schema and always looking in the object for the schema definition.
... View more
05-06-2016
11:10 AM
1 Kudo
I have a number of simple Avro objects stored in HBase and am trying to access them from Hive. I've set up a Hive table by following the instructions that I found here. Basically in Hive I do: DROP TABLE IF EXISTS HBaseAvro; CREATE EXTERNAL TABLE HBaseAvro ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,event:pCol",
"event.pCol.serialization.type" = "avro",
"event.pCol.avro.schema.url" = "hdfs:///tmp/kafka/avro/avro.avsc") TBLPROPERTIES("hbase.table.name" = "avro",
"hbase.mapred.output.outputtable" = "avro",
"hbase.struct.autogenerate" = "true"); If the Avro object contains the schema in the header, I have no problem and can access the data. However if the Avro object DOes NOT contain the schema then when I try and access the Avro object I get an IO Exception: {"message":"H170 Unable to fetch results. java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating event_pcol... If I do a DESCRIBE on the Hive table, I can see the table correctly, in that event_pcol is shown as a structure, with the correct fields. I've tried moving the avsc file to check that the CREATE TABLE is working and Hive correctly complains. With the CREATE as above the table appears to be created correctly and I can access the "key" values, so the problem appears to be with the Avro object. To me it looks like Hive is not using the schema definition passed in the schema.url parameter. I've tried including the schema as a schema.literal parameter and it still fails. Any ideas?
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Hive
05-04-2016
05:14 PM
Follow the link on my thread that refers to the ticket that has been raised. Looks like there is already a fix. Unfortunately I don't have an environment to build and test it.
... View more
05-04-2016
05:04 PM
1 Kudo
@Steven Cardella NiFi, Avro and Kafka are proving to be "entertaining". There is a fix for the Snappy compression in the pipeline as mentioned here Previous post
... View more