Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What triggers du in Impala?

avatar
Explorer

Hi there,

 

We are doing some testing on a very small cluster and we were experiencing some extra load by du command. It is affecting our testing results significantly and we are bypassing it by creating a symbolic link of du to a df command. 

 

Our testing steps:

1. on all nodes: echo 1 > /proc/sys/vm/drop_caches

2. run scripts

 

Can anyone has a detailed explaination of how the du command gets triggered by impala( we assume it's something related with vfs caching). Is there a config or a better way to make it not doing du?

 

Thanks a lot! Let me know if you need more information. 🙂

 

1 ACCEPTED SOLUTION

avatar

Are you sure it's Impala that's triggering it? I don't think Impala would use du for anything.

 

HDFS apparently does and Cloudera Manager might use it.

 

Have you tried tracing back what is running 'du'? E.g. run "ps auxf" to get a tree-view of processes.

View solution in original post

3 REPLIES 3

avatar
Explorer
Also we didn't have fs.du.interval setting in our config so by default it should be 600000 ms but we are seeing it much more often than that.

avatar

Are you sure it's Impala that's triggering it? I don't think Impala would use du for anything.

 

HDFS apparently does and Cloudera Manager might use it.

 

Have you tried tracing back what is running 'du'? E.g. run "ps auxf" to get a tree-view of processes.

avatar
Explorer
Actually it is datanode doing it. I guess I'll ask more about it as an HDFS topic. Thanks!