Support Questions
Find answers, ask questions, and share your expertise

How to capture network stats during hive/mapreduce jobs

Super Guru

I am running hive jobs using mapreduce engine. I am want to capture network stats to determine latency between data nodes and racks. Any suggestion how to capture and analyze?

4 REPLIES 4

Re: How to capture network stats during hive/mapreduce jobs

If you are not good with the stats provided at ResourceManager and would like to calculate some benchmarks particularly network related then you need some out of the box solution like "collectd".

Re: How to capture network stats during hive/mapreduce jobs

@Sunile Manjee

I would start looking at Ambari Metrics.

Other option is to use nmon (http://nmon.sourceforge.net/pmwiki.php), an easy tool to visualize and export all OS metrics (CPU, memory, disk and network). @Randy Gelhausen is also working on a zeppelin notebook to visualize nmon exported data (target to monitoring all servers in a data center that are not part of HDP cluster).

Another easy tool I like to use is iperf3 (http://software.es.net/iperf/), but this tool generates lot of network traffic to collect maximum bandwidth, I don't think it's a good idea to run this during your job execution.

Re: How to capture network stats during hive/mapreduce jobs

Super Guru

@Guilherme Braccialli thanks for the feedback. @Randy Gelhausen is the notebook available yet? Any chance you can share?

Re: How to capture network stats during hive/mapreduce jobs

Haven't written it up anywhere, but here's my parser for turning nmon files into a table.

It assumes you get whole nmon files written once per minute. I used the following command to create the files to /nmon:

nmon -f -s1 -c1 -m /nmon