Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Is it possible to export avro files from Hadoop to Teradata with Sqoop? And if ‘yes’, how?

avatar
New Contributor

Hi.

I’m asking this way because we imported data from Teradata to HDFS with Sqoop. I found many examples in the internet how to do it. It works an everything seems to be ok. I used the –as-avrodatafile command and also know where to find the .avsc file.

But now I am trying to export the same data back to my Teradata Table. I found many examples how to use the export in Sqoop to export .csv files. I was successful in exporting these files to teradata. But I couldn’t find anywhere how you can do it if you have avro files.

Can anyone give me some help, for example a link to a tutorial or a code example?

1 ACCEPTED SOLUTION

avatar
Master Mentor

Interesting question, I should try it out. That said if you can't figure out on your own or won't get other responses, then try creating a hive schema on top of your avro files, then run the following statement in hive to create a new table in orc or text format from your original avro table. Then you should be able to find examples of exporting with sqoop from orc or text

create table newtable as select * from avrotable
--sample avro backed hive table

CREATE DATABASE DEMO;
USE DEMO;
CREATE EXTERNAL TABLE mailinglist
  COMMENT "just drop the schema right into the HQL"
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION '/tmp/rdbms'
  TBLPROPERTIES (
    'avro.schema.literal'='{
      "type": "record",
      "name": "mailinglist",
      "namespace": "any.data",
      "fields": [
        {
          "name": "id",
          "type": [
            "null",
            "int"
          ]
        },
        {
          "name": "first_name",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "last_name",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "email",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "gender",
          "type": [
            "null",
            "string"
          ]
        }
      ]
    }');

View solution in original post

2 REPLIES 2

avatar
Master Mentor

Interesting question, I should try it out. That said if you can't figure out on your own or won't get other responses, then try creating a hive schema on top of your avro files, then run the following statement in hive to create a new table in orc or text format from your original avro table. Then you should be able to find examples of exporting with sqoop from orc or text

create table newtable as select * from avrotable
--sample avro backed hive table

CREATE DATABASE DEMO;
USE DEMO;
CREATE EXTERNAL TABLE mailinglist
  COMMENT "just drop the schema right into the HQL"
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION '/tmp/rdbms'
  TBLPROPERTIES (
    'avro.schema.literal'='{
      "type": "record",
      "name": "mailinglist",
      "namespace": "any.data",
      "fields": [
        {
          "name": "id",
          "type": [
            "null",
            "int"
          ]
        },
        {
          "name": "first_name",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "last_name",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "email",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "gender",
          "type": [
            "null",
            "string"
          ]
        }
      ]
    }');

avatar
New Contributor

Thank you Artem!

I think it is a good idea to do it this way, if there is no other possibility. We use the avro files as backup before we update the data on Teradata. If we just need some information, we can do it with the hive schema on top of the avro files.

Usually we don’t need to write the data back to Teradata. But a backup without the possibility to restore the data would be useless.