Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Why Standalone NiFi (PutHDFS) is slow ingesting small files in HDFS

avatar

Having Problem with the slowness of PutHDFS to HDFS.

15042-file1.png

15043-file2.png

1 ACCEPTED SOLUTION

avatar
Master Guru

Have you tried installing the hadoop client on the same machine as NiFi and then sending a similar file to HDFS from the command line? In most cases the issue here is the network or disk.

View solution in original post

3 REPLIES 3

avatar
Master Guru

Have you tried installing the hadoop client on the same machine as NiFi and then sending a similar file to HDFS from the command line? In most cases the issue here is the network or disk.

avatar

@Bryan Bende

Btw, most of the file is small around 10~50 mb,

Upon testing:

350 Files, 10 mb per file, 1Gbps Network

Client to Server (SSH) - 36 secs

[admin@client1 data]$ cat transfer.logs Fri May 5 09:42:40 +08 2017 Fri May 5 09:43:16 +08 2017 36 secs

Server to HDFS - 4,479 secs or 1.27 Hrs

[hadoop@master1 data]$ ./puthdfs-stdf.sh Fri May 5 09:55:53 +08 2017 Fri May 5 11:10:32 +08 2017 4479 sec

[client@master1 data]$ cat puthdfs-stdf.sh
start=`date +%s`
echo `date`
hdfs dfs -put *.txt /data/test/testing/
echo `date`
end=`date +%s`
runtime=$((end-start))
echo $runtime

I haven't install Hadoop client in NIFI Server (a windows server)

avatar

@Bryan Bende Thanks!

Slow HDFS Put cause by a slow BlockReceiver.

2017-06-05 13:17:34,210 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(571)) - Slow BlockReceiver write packet to mirror took 6316ms (threshold=300ms)