Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Sqoop performance regarding input format

Solved Go to solution
Highlighted

Sqoop performance regarding input format

In the case I want to export data, using Sqoop, from HDFS to an external destination (Teradata for example), is there a recommendation regarding the format of the input files?

AFAIK, supported formats are :

  • Delimited text files
  • Sequence files
  • ORC files

Do we observe performance differences between input formats?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Sqoop performance regarding input format

@Pierre Villard

Sqoop internally using yarn jobs for extracting data from HDFS. ORC is regarding as better performance for read even with Hive: You can refer to below link for details:

http://www.slideshare.net/StampedeCon/choosing-an-hdfs-data-storage-format-avro-vs-parquet-and-more-...

Hope this helps.

Thanks and Regards,

Sindhu

1 REPLY 1

Re: Sqoop performance regarding input format

@Pierre Villard

Sqoop internally using yarn jobs for extracting data from HDFS. ORC is regarding as better performance for read even with Hive: You can refer to below link for details:

http://www.slideshare.net/StampedeCon/choosing-an-hdfs-data-storage-format-avro-vs-parquet-and-more-...

Hope this helps.

Thanks and Regards,

Sindhu