Support Questions
Find answers, ask questions, and share your expertise

Re: metrics monitor

Super Collaborator

Hi. any one could i help me about what can i do?

thanks

Re: metrics monitor

Contributor

Can you attach all the logs (ambari-server, ambari-agents, namenodes, datanodes and other services) from the 6 nodes. Just want to check if the nodes are able to communicate? how many live datanodes can you see on the console. Also check if you can get into <namenodehost>:50070 web UI (not an ambari UI) and check for the live nodes. Load a file into HDFS and do hdfs fsck <filename> -files -blocks. Check if your data is distributed on the nodes. Otherwise there should be some problem with HDFS.

Re: metrics monitor

Super Collaborator

hi:

all the logs are fine from ambari metric collectors:

2016-09-13 21:44:23,317 [INFO] emitter.py:91 - server: http://xxxxxxx:6188/ws/v1/timeline/metrics
2016-09-13 21:45:23,326 [INFO] emitter.py:91 - server: http://xxxxxxx:6188/ws/v1/timeline/metrics
2016-09-13 21:46:23,334 [INFO] emitter.py:91 - server: http://xxxxxxx:6188/ws/v1/timeline/metrics
2016-09-13 21:47:23,344 [INFO] emitter.py:91 - server: http://xxxxxxx:6188/ws/v1/timeline/metrics

INFO 2016-09-13 21:49:00,916 ActionQueue.py:99 - Adding STATUS_COMMAND for service AMBARI_METRICS of cluster rsicluster01 to the queue.
INFO 2016-09-13 21:49:00,996 ActionQueue.py:99 - Adding STATUS_COMMAND for service HBASE of cluster rsicluster01 to the queue.
INFO 2016-09-13 21:49:01,075 ActionQueue.py:99 - Adding STATUS_COMMAND for service HBASE of cluster rsicluster01 to the queue.
INFO 2016-09-13 21:49:01,161 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster rsicluster01 to the queue.
INFO 2016-09-13 21:49:01,243 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster rsicluster01 to the queue.

INFO 2016-09-13 21:49:01,320 ActionQueue.py:99 - Adding STATUS_COMMAND for service HIVE of cluster rsicluster01 to the queue.
INFO 2016-09-13 21:49:01,404 ActionQueue.py:99 - Adding STATUS_COMMAND for service HIVE of cluster rsicluster01 to the queue.
INFO 2016-09-13 21:49:01,486 ActionQueue.py:99 - Adding STATUS_COMMAND for service MAPREDUCE2 of cluster rsicluster01 to the queue.
INFO 2016-09-13 21:49:01,566 ActionQueue.py:99 - Adding STATUS_COMMAND for service OOZIE of cluster rsicluster01 to the queue.
INFO 2016-09-13 21:49:01,645 ActionQueue.py:99 - Adding STATUS_COMMAND for service PIG of cluster rsicluster01 to the queue.
INFO 2016-09-13 21:49:01,729 ActionQueue.py:99 - Adding STATUS_COMMAND for service SLIDER of cluster rsicluster01 to the queue.

INFO 2016-09-13 21:49:01,808 ActionQueue.py:99 - Adding STATUS_COMMAND for service SPARK of cluster rsicluster01 to the queue.
INFO 2016-09-13 21:49:01,898 ActionQueue.py:99 - Adding STATUS_COMMAND for service SQOOP of cluster rsicluster01 to the queue.
INFO 2016-09-13 21:49:01,977 ActionQueue.py:99 - Adding STATUS_COMMAND for service TEZ of cluster rsicluster01 to the queue.
INFO 2016-09-13 21:49:02,059 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster rsicluster01 to the queue.
INFO 2016-09-13 21:49:02,139 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster rsicluster01 to the queue.
INFO 2016-09-13 21:49:02,217 ActionQueue.py:99 - Adding STATUS_COMMAND for service ZOOKEEPER of cluster rsicluster01 to the queue.
INFO 2016-09-13 21:49:12,302 Heartbeat.py:78 - Building Heartbeat: {responseId = 40777, timestamp = 1473796152302, commandsInProgress = False, componentsMapped = True}
INFO 2016-09-13 21:49:12,312 Controller.py:268 - Heartbeat response received (id = 40778)
INFO 2016-09-13 21:49:22,312 Heartbeat.py:78 - Building Heartbeat: {responseId = 40778, timestamp = 1473796162312, commandsInProgress = False, componentsMapped = True}
INFO 216-09-13 21:49:22,354 Controller.py:268 - Heartbeat response received (id = 40779)

but in all the logs i can see this error: tail -200f /var/log/hadoop/hdfs/hadoop-hdfs-datanode-xxxxxxx.log

2016-09-13 21:50:55,815 ERROR datanode.DataNode (DataXceiver.java:run(278)) - lxxxxxxxx:50010:DataXceiver error processing unknown operation  src: /127.0.0.1:56332 dst: /127.0.0.1:50010
java.io.EOFException
	at java.io.DataInputStream.readShort(DataInputStream.java:315)
	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
	at java.lang.Thread.run(Thread.java:745)




but the mapreduces and spark jobs are working, whats the problem????

Re: metrics monitor

Super Collaborator

all my nodes are working perfectly:

7602-snip20160913-2.png

7603-snip20160913-3.png

7604-snip20160913-4.png

also y can get the namenode:50070

7605-snip20160913-5.png

so, everything is working fine.

Re: metrics monitor

Contributor

@Roberto Sancho

From the screen shot you've attached, I see disk usage on only one node. Can you try running this command "hdfs fsck <filename> -files -blocks" an check if data is distributed across multiple nodes. If yes what's the fix?

Re: metrics monitor

Super Collaborator

, everything is working, because the monitor collectors are not working in all the node, but all nodes works fine