Support Questions

Find answers, ask questions, and share your expertise

Is there a way to create Hive table based on Avro data directly ?

avatar

I have a dataset that is almost 600GB in Avro format in HDFS. Whay is the most efficient way to create a Hive table directly on this dataset ?

For smaller datasets, I can move my data to disk, use Avro tools to extract schema, upload schema to HDFS and create Hive table based on that schema. Is there a way to directly extract Avro schema from a dataset in HDFS without writing java code ?

1 ACCEPTED SOLUTION

avatar
Master Mentor

You can try the following, cat your large file, grab a few lines output to new file on local fs. Ill be curious to know if that works with avro serialization.

http://stackoverflow.com/questions/22852063/how-to-copy-first-few-lines-of-a-large-file-in-hadoop-to...

Then use avro-tools to extract schema.

View solution in original post

10 REPLIES 10

avatar
Contributor

hadoop jar avro-tools-1.8.2.jar getschema hdfs_archive/mydoc.avro

would also done the job

,

instead of java -jar, you can directly run it on hdfs thanks to :

hadoop jar avro-tools-1.8.2.jar getschema hdfsPathTOAvroFile.avro