Support Questions

DakshaKarthik · ‎07-17-2015

The testdfsio write operation takes longer time in a 8 datanode with 48GB RAM and 16 CPU Core with 10GigE. The 10GigE network port utilization 1.5 to 2 gigabits/sec for each datanode during write operation.

xfs filesystem config:

parted -s /dev/sdb mklabel gpt mkpart /dev/sdb1 xfs 6144s 10.0TB

/sbin/mkfs.xfs -f -L DISK1 -l size=128m,sunit=256,lazy-count=1 -d su=512k,sw=6 -r extsize=256k /dev/sdb1

mkdir /disk1

mount /disk1

/sbin/blockdev --setra 1024 /dev/sdb

df -h /disk1"

/etc/fstab config:

LABEL=DISK1 /disk1 xfs allocsize=128m,noatime,nobarrier,nodiratime 0 0

The /var/log/hadoop-hdfs/hadoop-cmf-hdfs-DATANODE-hadoop1.com.log.out shows

For dfs.blocksize=128m

2015-07-14 17:47:16,391 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:614ms (threshold=300ms)

2015-07-14 17:47:16,400 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:623ms (threshold=300ms)

2015-07-14 17:47:16,401 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:548ms (threshold=300ms)

2015-07-14 17:47:16,420 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:567ms (threshold=300ms)

For dfs.blocksize=512m

2015-07-17 09:46:28,999 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 408ms (threshold=300ms)

2015-07-17 09:46:28,999 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 448ms (threshold=300ms)

2015-07-17 09:46:29,009 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 409ms (threshold=300ms)

2015-07-17 09:46:29,009 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 451ms (threshold=300ms)

Please throw some light to fix the above issue

regards,

Karthik

Harsh J · ‎07-17-2015

The WARNing aims to act as a guideline to indicate that you may be facing hardware level trouble. Your values aren't excessively high to be extremely worried about, but to explain each of the two warnings:

> Slow BlockReceiver write data to disk cost

This is measured as the time taken to write to disk when a data packet comes in for a block write. Java-wise, its just the duration measurement behind an equivalent of "FileOutputStream.write(…)" call, which actually may not even target the disk in most setups, and go to the Linux buffer cache instead.

It appears your writes to these disks seem too slow. I am not well versed with XFS, but we recommend use of EXT4.

What is the meaning of "allocsize=128m" in your mount options BTW? Note that we do not write entire blocks at once into an open file, but write packets in a streaming manner (building towards the block size), and packet sizes range between 64k to 128k each.

> Slow BlockReceiver write packet to mirror

This measures the duration taken to write to the next DN over a regular TCP socket, and the time taken to flush the socket. We forward the same packet here (small sizes, like explained above). An increase in this typically indicates higher network latency, as Java-wise this is a pure SocketOutputStream.write() + SocketOutputStream.flush() cost.

Hopefully these facts should help you tune your configuration better. Its likely to be a tuning issue than anything else, given the new hardware.

DakshaKarthik · ‎07-17-2015

Hello Harsh,
Thanks for your response.
Is there any specific reason to recommend ext4. I would like to justify whether ext4 provides better performance than xfs. Can you please provide your input?

Harsh J · ‎07-19-2015

We do almost all of our internal testing with EXT4, so we're pretty certain of its reliability and performance.

At least a few years ago, XFS had numerous issues that impacted its use with Hadoop-style workloads. While I am sure the state has improved now if you're using more current versions, we haven't any formal XFS tuning recommendations to offer at the moment, and still recommend use of EXT4 which has been well tested all these years.

Does a normal/default alloc size give you better performance? I am not sure if you should be setting a 128m alloc size.

DakshaKarthik · ‎07-27-2015

Hello Harsh,

I tried with ext4 as well. Still the slow block receiver write data to disk warning is logged in datanode log.

Do you have any best practice guidelines for Performance turning at OS, Network, HDFS, Server Hardware, filesystem specific? I need performance tuning guidelines for HDFS block from OS to Disk which will help to find the problametic area. can you throw some light?

regar

Cloudera Community

Support Questions

slow blockreceiver write data to disk