Support Questions

pvillard · ‎06-09-2016

In the case I want to export data, using Sqoop, from HDFS to an external destination (Teradata for example), is there a recommendation regarding the format of the input files?

AFAIK, supported formats are :

Delimited text files
Sequence files
ORC files

Do we observe performance differences between input formats?

Thanks

ssubhas · ‎06-09-2016

@Pierre Villard

Sqoop internally using yarn jobs for extracting data from HDFS. ORC is regarding as better performance for read even with Hive: You can refer to below link for details:

http://www.slideshare.net/StampedeCon/choosing-an-hdfs-data-storage-format-avro-vs-parquet-and-more-...

Hope this helps.

Thanks and Regards,

Sindhu

View solution in original post

ssubhas · ‎06-09-2016

@Pierre Villard

Sqoop internally using yarn jobs for extracting data from HDFS. ORC is regarding as better performance for read even with Hive: You can refer to below link for details:

http://www.slideshare.net/StampedeCon/choosing-an-hdfs-data-storage-format-avro-vs-parquet-and-more-...

Hope this helps.

Thanks and Regards,

Sindhu

Cloudera Community

Support Questions

Sqoop performance regarding input format

SQOOP Performance tuning

Tips and best practices for optimizing Hive perfor...

Change input and output format of existing hive ta...

Optimizing Hive queries for ORC formatted tables

SQOOP IMPORT FROM ORACLE TIMESTAMP ERROR ORA-0184...

Performance Delays in Namenode Caused by Multiple ...

Oozie coordinator and based on input data events

sqoop import/export tutorial

Tuning Hbase for optimized performance ( Part 1 )

Hive on Tez Performance Tuning - Determining Reduc...