Created on 04-21-2016 03:16 PM - edited 09-16-2022 03:15 AM
Hi there,
We are doing some testing on a very small cluster and we were experiencing some extra load by du command. It is affecting our testing results significantly and we are bypassing it by creating a symbolic link of du to a df command.
Our testing steps:
1. on all nodes: echo 1 > /proc/sys/vm/drop_caches
2. run scripts
Can anyone has a detailed explaination of how the du command gets triggered by impala( we assume it's something related with vfs caching). Is there a config or a better way to make it not doing du?
Thanks a lot! Let me know if you need more information. 🙂
Created 04-22-2016 09:21 AM
Are you sure it's Impala that's triggering it? I don't think Impala would use du for anything.
HDFS apparently does and Cloudera Manager might use it.
Have you tried tracing back what is running 'du'? E.g. run "ps auxf" to get a tree-view of processes.
Created 04-21-2016 04:10 PM
Created 04-22-2016 09:21 AM
Are you sure it's Impala that's triggering it? I don't think Impala would use du for anything.
HDFS apparently does and Cloudera Manager might use it.
Have you tried tracing back what is running 'du'? E.g. run "ps auxf" to get a tree-view of processes.
Created 04-22-2016 09:40 AM