Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Is there a way to create Hive table based on Avro data directly ?

avatar

I have a dataset that is almost 600GB in Avro format in HDFS. Whay is the most efficient way to create a Hive table directly on this dataset ?

For smaller datasets, I can move my data to disk, use Avro tools to extract schema, upload schema to HDFS and create Hive table based on that schema. Is there a way to directly extract Avro schema from a dataset in HDFS without writing java code ?

1 ACCEPTED SOLUTION

avatar
Master Mentor

You can try the following, cat your large file, grab a few lines output to new file on local fs. Ill be curious to know if that works with avro serialization.

http://stackoverflow.com/questions/22852063/how-to-copy-first-few-lines-of-a-large-file-in-hadoop-to...

Then use avro-tools to extract schema.

View solution in original post

10 REPLIES 10

avatar
New Member

hadoop jar avro-tools-1.8.2.jar getschema hdfs_archive/mydoc.avro

would also done the job

,

instead of java -jar, you can directly run it on hdfs thanks to :

hadoop jar avro-tools-1.8.2.jar getschema hdfsPathTOAvroFile.avro