Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Is there a way to create Hive table based on Avro data directly ?

Solved Go to solution

Is there a way to create Hive table based on Avro data directly ?

I have a dataset that is almost 600GB in Avro format in HDFS. Whay is the most efficient way to create a Hive table directly on this dataset ?

For smaller datasets, I can move my data to disk, use Avro tools to extract schema, upload schema to HDFS and create Hive table based on that schema. Is there a way to directly extract Avro schema from a dataset in HDFS without writing java code ?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Is there a way to create Hive table based on Avro data directly ?

Mentor

You can try the following, cat your large file, grab a few lines output to new file on local fs. Ill be curious to know if that works with avro serialization.

http://stackoverflow.com/questions/22852063/how-to-copy-first-few-lines-of-a-large-file-in-hadoop-to...

Then use avro-tools to extract schema.

10 REPLIES 10

Re: Is there a way to create Hive table based on Avro data directly ?

CREATE EXTERNAL TABLE avro_hive_table
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'TBLPROPERTIES
('avro.schema.url'='hdfs://localdomain/user/avro/schemas/activity.avsc')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION
'/user/avro/applog_avro';

Re: Is there a way to create Hive table based on Avro data directly ?

Thanks @Ancil McBarnett @Sunile Manjee. I don't have .avsc file for schema. How can I extract Avro schema for this data ?

Re: Is there a way to create Hive table based on Avro data directly ?

Super Collaborator

@Ancil McBarnett

Are you sure AVRO backed tables can be created as external tables? If I run your statement I get problems on the LOCATION predicate. Hive does not expect the LOCATION clause it seems

Edit:

Never mind, you can but the order of the statements matters:

This works:

CREATE EXTERNAL TABLE as_avro 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 
STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' 
LOCATION '/user/root/as_avro' 
TBLPROPERTIES ('avro.schema.url'='hdfs:///user/root/avro.avsc');

Re: Is there a way to create Hive table based on Avro data directly ?

Super Guru

Try this:

CREATE EXTERNAL TABLE  tableName 
PARTITIONED BY (ingestiondatetime BIGINT, recordtype STRING)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serd2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.url'='hdfs:///user/file.avsc');

Re: Is there a way to create Hive table based on Avro data directly ?

Super Collaborator

This statement misses the 'LOCATION' clause so it is not an external table

Re: Is there a way to create Hive table based on Avro data directly ?

Mentor

You can try the following, cat your large file, grab a few lines output to new file on local fs. Ill be curious to know if that works with avro serialization.

http://stackoverflow.com/questions/22852063/how-to-copy-first-few-lines-of-a-large-file-in-hadoop-to...

Then use avro-tools to extract schema.

Re: Is there a way to create Hive table based on Avro data directly ?

Thank You @Artem Ervits @Sunile Manjee @Ancil McBarnett. I was able to get my solution using a combination of commands.

hdfs dfs -cat $1 | head --bytes 10K > $SAMPLE_FILE 
java -jar $AVRO_TOOLS_PATH/avro-tools-1.7.7.jar getschema $SAMPLE_FILE > $AVRO_SCHEMA_FILE 
hdfs dfs -put $AVRO_SCHEMA_FILE $AVRO_SCHEMA_DIR

head command needs to be used with --bytes option to get first 10K bytes. Then I used Avro tools to retrieve schema and copied schema back to HDFS.

Re: Is there a way to create Hive table based on Avro data directly ?

Mentor

You taught me something new, awesome community!

Re: Is there a way to create Hive table based on Avro data directly ?

New Contributor

Perhaps this page can help you:

create hive table based avro

Don't have an account?
Coming from Hortonworks? Activate your account here