Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Sqoop performance regarding input format

In the case I want to export data, using Sqoop, from HDFS to an external destination (Teradata for example), is there a recommendation regarding the format of the input files?

AFAIK, supported formats are :

  • Delimited text files
  • Sequence files
  • ORC files

Do we observe performance differences between input formats?

Thanks

1 ACCEPTED SOLUTION

@Pierre Villard

Sqoop internally using yarn jobs for extracting data from HDFS. ORC is regarding as better performance for read even with Hive: You can refer to below link for details:

http://www.slideshare.net/StampedeCon/choosing-an-hdfs-data-storage-format-avro-vs-parquet-and-more-...

Hope this helps.

Thanks and Regards,

Sindhu

View solution in original post

1 REPLY 1

@Pierre Villard

Sqoop internally using yarn jobs for extracting data from HDFS. ORC is regarding as better performance for read even with Hive: You can refer to below link for details:

http://www.slideshare.net/StampedeCon/choosing-an-hdfs-data-storage-format-avro-vs-parquet-and-more-...

Hope this helps.

Thanks and Regards,

Sindhu