Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to import compressed AVRO files to Impala table?

How to import compressed AVRO files to Impala table?

New Contributor

In my work, I import AVRO files into impala tables by copy the files into HDFS then execute "refresh <table>" in impala.

 
But when I want to do it with compressed files, it didn't work. Here is what I do:
 
1. Enable Hive compress in hive shell:
 
 
hive> set hive.exec.compress.output=true;
hive> set avro.output.codec=bzip2;
 
2. Create a table:
 
 
CREATE TABLE `log_bzip2`(
  `timestamp` bigint COMMENT 'from deserializer',
  `appid` string COMMENT 'from deserializer',
  `clientid` string COMMENT 'from deserializer',
  `statkey` string COMMENT 'from deserializer',
  `expid` string COMMENT 'from deserializer',
  `modid` string COMMENT 'from deserializer',
  `value` double COMMENT 'from deserializer',
  `summary` string COMMENT 'from deserializer',
  `custom` string COMMENT 'from deserializer')
PARTITIONED BY (
  `day` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
  'avro.schema.url'='hdfs://szq2.appadhoc.com:8020/user/hive/log.avsc');
 
 
3. Load the compressed AVRO file into HDFS:
 
hdfs dfs -put log.2016-03-07.1457184357726.avro.bz2 /user/hive/warehouse/adhoc_data_fast.db/log_bzip2/2016-03-07
 
4. Add partition and refresh in Impala shell:
 
alter table log_bzip2 add partition (day="2016-03-07") location '/user/hive/warehouse/adhoc_data_fast.db/log_bzip2/2016-03-07/';
 
refresh log_bzip2;
 
5. Query it but not work:
 
 
select * from log_bzip2 limit 10;
Query: select * from log_bzip2 limit 10

WARNINGS: Invalid AVRO_VERSION_HEADER: '42 5a 68 39 '
 
 
The only document I've found about enable compress with avro tables is this link: http://www.cloudera.com/documentation/archive/impala/2-x/2-1-x/topics/impala_avro.html#avro_compress... . 
 
How can I do it right? Thanks!

 

1 REPLY 1
Highlighted

Re: How to import compressed AVRO files to Impala table?

Rising Star
Hi, couple of suggestions on attacking the problem by simplifying it somewhat:

1. Can you try with a table and data which is not partitioned?

2. Can you try with external table instead of managed, and use a path different to the hive warehouse?

Martin
Don't have an account?
Coming from Hortonworks? Activate your account here