I have set up a job that runs several hadoop commands including 'hdfs dfsadmin' and 'hdfs dsfs -du' and I was wondering if these could be taxing on my cluster at all if I run them every 5 minutes or would it be harmless? The reason I am running several of these commands every 5 minutes is I have set up a job to run these commands, parse out the output to a structured format for a hive table so we can create historical reports about our system
@Chad Woodhedad, yes 'fs -du` is expensive compared to other read operations. Running it every 5 minutes is probably overkill. You can run it less frequently e.g. once an hour.
`hdfs dfsadmin -report` is also expensive compared to typical read operations.
We've occasionally seen these calls affect NameNode performance when buggy monitoring scripts invoke them many times per second. Barring that, you should be fine.