Created on 09-02-2016 08:18 AM - edited 09-16-2022 03:37 AM
I have an AVRO schema which works and querries fine on HIVE, but when we query the same on Impala it thorws an error saying
"Your query has the following error(s):
Could not connect to <HOST NAME>:21050"
Which is weird and i dont see any log information on Hue as i am able to browse other tables that are defined on TEXT format within same database.
we are on impala-2.1.3+cdh5.3.3+0
One thing which looks wired on my schema when i decsribe it on Impala is it has the first row as NULL like below and i am not sure why!
My AVRO schema file is defined like below:
Its just sample and all the fields are defined as STRING
PS: Not sure why my pics have these wired color.
Created 09-12-2016 05:20 PM
Thanks for following up!
I'm pretty sure your table shoud work on more recent versions of Impala since we've fixed several Avro issues related to how schemas are defined.
As a workaround, you could try the following things:
1. In your .avsc file make all fields nullable by specifying the types a a union of null and the type like this:
type:["null", "int"]
2. Also specify corresponding matching column definitions in your CREATE TABLE, i.e.
CREATE TABLE MI_FULL (col1 INT, col2 STRING, )
ROW FORMAT SERDE
(the rest is exactly the same as before)
Let me know if you have questions and whether those workarounds helped!
Created 09-06-2016 10:27 AM
I would suggest looking in the log directory to see if you see any crash information there in impalad.INFO or impalad.FATAL. If so, can you please share them ?
Created 09-06-2016 01:42 PM
I did kwho, i dont see any entries on the files you have listed.
Created 09-07-2016 05:58 PM
It looks like your table metadata is in a strange state. How did you create the table exactly? Did you alter the table (e.g. add/remove columns)?
Created 09-12-2016 02:31 PM
Sorry for replying late.
Alex,
I dint alter the table or columns, this is how i created the table.
CREATE TABLE MI_FULL ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.url'='hdfs://path/filename.avsc') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ('avro.schema.url'=hdfs://path/filename.avsc) ;
And then did an insert into this table, please let me know if there is something wrong with creating like this.
Created 09-12-2016 05:20 PM
Thanks for following up!
I'm pretty sure your table shoud work on more recent versions of Impala since we've fixed several Avro issues related to how schemas are defined.
As a workaround, you could try the following things:
1. In your .avsc file make all fields nullable by specifying the types a a union of null and the type like this:
type:["null", "int"]
2. Also specify corresponding matching column definitions in your CREATE TABLE, i.e.
CREATE TABLE MI_FULL (col1 INT, col2 STRING, )
ROW FORMAT SERDE
(the rest is exactly the same as before)
Let me know if you have questions and whether those workarounds helped!
Created 09-16-2016 10:12 AM
I am trying out the options you have suggested alex, i should have my results mostly by today.
Created 09-19-2016 08:09 AM
Voila its works fine they way you asked me to define the tables to get to view the data on Impala.
Thanks much Alex.
Created 09-19-2016 09:52 AM
Thanks for following up and confirming that it works!
Created 10-06-2016 09:07 AM
But Just want to make a note that it defeates the purpose of using AVRO schema as for any schema changes we will have to make changes to the AVRO schema file and also drop the table and recreate them using the new schema to work.