Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

What is dfs.datanode.DatanodeNetworkErrors mean in grafana

In the document, Network Errors means Rate of network errors on JVM.

But I am a little bit confused at JVM. Is JVM the container? If so, does this value represent the rate of drop package while connecting containers on the datanodes?

https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-operations/content/grafana_hdfs_d...

3 REPLIES 3

Super Mentor

@Randy Huang

Yes, DataNode runs on JVM. So the details that you get from "NETWORK ERRORS / GC COUNT" section is for the JVM on which the DataNode is running.

Basically this reads the "dfs.datanode.DatanodeNetworkErrors" metrics of DataNode. Which is "Count of network errors on the datanode". https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache...

Similarly for GC count it reads the "jvm.JvmMetrics.GcCount" metrics of DataNode.

Thanks for your answer. One more question is how dose this value calculate?

Because I have doubt with the the value of vertical axis.

image-11.jpg

Super Mentor

@Randy Huang

It basically check the "DataXceiverServer" errors and then based on that it increments the error counter using the following method when it encounters Read/write data error from/to the DataXceiverServer.

  /**
   * Read/write data from/to the DataXceiverServer.
   */
  @Override
  public void run() {
    int opsProcessed = 0;
    Op op = null;

    try {
      dataXceiverServer.addPeer(peer, Thread.currentThread(), this);
      peer.setWriteTimeout(datanode.getDnConf().socketWriteTimeout);
      InputStream input = socketIn;
      try {
        IOStreamPair saslStreams = datanode.saslServer.receive(peer, socketOut,
          socketIn, datanode.getXferAddress().getPort(),
          datanode.getDatanodeId());
        input = new BufferedInputStream(saslStreams.in,
          HdfsConstants.SMALL_BUFFER_SIZE);
        socketOut = saslStreams.out;
      } catch (InvalidMagicNumberException imne) {
        if (imne.isHandshake4Encryption()) {
          LOG.info("Failed to read expected encryption handshake from client " +
              "at " + peer.getRemoteAddressString() + ". Perhaps the client " +
              "is running an older version of Hadoop which does not support " +
              "encryption");
        } else {
          LOG.info("Failed to read expected SASL data transfer protection " +
              "handshake from client at " + peer.getRemoteAddressString() + 
              ". Perhaps the client is running an older version of Hadoop " +
              "which does not support SASL data transfer protection");
        }
        return;
      }
      
      super.initialize(new DataInputStream(input));
      
      // We process requests in a loop, and stay around for a short timeout.
      // This optimistic behaviour allows the other end to reuse connections.
      // Setting keepalive timeout to 0 disable this behavior.
      do {
        updateCurrentThreadName("Waiting for operation #" + (opsProcessed + 1));

        try {
          if (opsProcessed != 0) {
            assert dnConf.socketKeepaliveTimeout > 0;
            peer.setReadTimeout(dnConf.socketKeepaliveTimeout);
          } else {
            peer.setReadTimeout(dnConf.socketTimeout);
          }
          op = readOp();
        } catch (InterruptedIOException ignored) {
          // Time out while we wait for client rpc
          break;
        } catch (IOException err) {
          // Since we optimistically expect the next op, it's quite normal to get EOF here.
          if (opsProcessed > 0 &&
              (err instanceof EOFException || err instanceof ClosedChannelException)) {
            if (LOG.isDebugEnabled()) {
              LOG.debug("Cached " + peer + " closing after " + opsProcessed + " ops");
            }
          } else {
            incrDatanodeNetworkErrors();
            throw err;
          }
          break;
}

.

Please Notice:

          } else {
            incrDatanodeNetworkErrors();
            throw err;
          }


Reference Code:

https://github.com/apache/hadoop/blob/release-2.7.3-RC2/hadoop-hdfs-project/hadoop-hdfs/src/main/jav...

.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.