Reply
New Contributor
Posts: 1
Registered: ‎12-23-2015

Sqoop as Avro file. Missing avsc files

I sqooped a table and store it in avro format using Oozie workflow. The sqoop ran fine and the files are stored in .avro in the specified directory. I am trying to find out where Oozie will store the .avsc file which contains the schema. Could someone please help locate the .avsc file?

Posts: 1,567
Kudos: 289
Solutions: 240
Registered: ‎07-31-2013

Re: Sqoop as Avro file. Missing avsc files

If you read the Avro format spec, you'll realise that the schema for Avro files are present in the file's own headers: http://avro.apache.org/docs/current/spec.html#Object+Container+Files.

There's usually no need for a separate schema file unless you want to modify the reader schema. See http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_ig_avro_usage.h... for the broader topic on how to use Avro files among various components.

If you do want to extract a schema out into a separate file for whatever reason, use the avro-tools command on the file:

~> avro-tools getschema hdfs://namenode-host:port/path/to/file.avro > file.avsc
~> cat file.avsc
Backline Customer Operations Engineer
Highlighted
New Contributor
Posts: 3
Registered: ‎07-08-2016

Re: Sqoop as Avro file. Missing avsc files

if you want to create a hive table you still need the avsc file right?

 

of course you can extract it again from the header line in the avro datafile but having the avsc file stored in hdfs somewhere prevents us from creating additional shell actions (and the additional processing costs of doing just that).

Announcements