Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

File load strategy for large files (per file volume greater than 1 TB)

Highlighted

File load strategy for large files (per file volume greater than 1 TB)

Rising Star

Hi,

What should be the strategy for loading files (Volume per file is more than 1 TB) in a reliable, fail-safe manner into HDFS?

 

Flume provides the fail-safety and reliability, but it is ideally meant for regularly-generated files into HDFS, my understanding is that it works fine for large no of file ingestion into HDFS ideally suitable for scenarios where data is generated in mini batches, but might not be efficient for single large file transfer into HDFS, please let me know if I am wrong here.

Also hadoop fs -put command cannot provide the fail safety, in case the transfer fails it won't restart the process

 

Regards,

Rajib