Support Questions
Find answers, ask questions, and share your expertise

If I am moving a huge file by getfile processor to hdfs by putHDFS processor , can we multi thread it ? So that copy can be faster.

New Contributor

Hi want to know if I will move the file from local to HDFS by nifi.

I am using getfile --> putHDFS processor.

Can we speed up the copy (lets say of Big file) by creating parallelism or multithread ?

thanks for answer in advance!


Super Guru

@Ganesh Ganjare

Do you want to speed up the get process? This has to supported by your file system. Ignore Nifi for a moment. Can you create a Java program to read file from local file system with multiple threads? Answer is yes. But what would really happen? The file is sitting on one spinning disk. That's your limiting factor. Each thread will be spinning the disk to random locations and making it slower than it would be with one thread. That's why you should read one file with one thread sequentially.

Now, if you have a system like HDFS, that distributed one file to multiple disks and multiple nodes, well, then you can use multiple threads in parallel to read parts of file from each disk but notice on a single disk, operation is still single threaded (one mapper per disk).

So, no your get operation cannot be multi threaded.