Support Questions
Find answers, ask questions, and share your expertise

Sqoop performance regarding input format

In the case I want to export data, using Sqoop, from HDFS to an external destination (Teradata for example), is there a recommendation regarding the format of the input files?

AFAIK, supported formats are :

  • Delimited text files
  • Sequence files
  • ORC files

Do we observe performance differences between input formats?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

@Pierre Villard

Sqoop internally using yarn jobs for extracting data from HDFS. ORC is regarding as better performance for read even with Hive: You can refer to below link for details:

http://www.slideshare.net/StampedeCon/choosing-an-hdfs-data-storage-format-avro-vs-parquet-and-more-...

Hope this helps.

Thanks and Regards,

Sindhu

View solution in original post

1 REPLY 1

@Pierre Villard

Sqoop internally using yarn jobs for extracting data from HDFS. ORC is regarding as better performance for read even with Hive: You can refer to below link for details:

http://www.slideshare.net/StampedeCon/choosing-an-hdfs-data-storage-format-avro-vs-parquet-and-more-...

Hope this helps.

Thanks and Regards,

Sindhu

View solution in original post