Hi dear experts!
i'm wounduring how it possible to acselirate loading into HDFS one huge file.
Let's say i have 1TB file on Linux FS and i need to load it as fast as possible.
could someone give me any idea or recomendation?
Thank you so much for your reply!
one question more:
when you said:
"For (perhaps, untested) a slight increase in performance: Parallelise and chunk the writes into multiple precise parts of the preferred block size (256 MB)"
did you mean cut file on the source side (like split command in the Linux) or something else?
and one more question:
i've tried to:
1) mount source on every node of Hadoop cluster (over NFS)
2) run distcp command with -m option (where specivied few mappers) like: hadoop distcp file:///stage hdfs://stage/
I saw only one mapper. so, seems that hadoop is not able to create myltiple splits when it work with file:///. maybe the possible some workaround or other trick with hadoop distcp?