Reply
New Contributor
Posts: 6
Registered: ‎10-09-2013

Is Navigator actually in the quickstart VM? How does it appear, as 'Audit'? No fs -cat events seen.


I'm trying to see how the current version of Navigator works.

The website says that the quickstart VM contains Navigator, but I'm not seeing 'Navigator' anywhere in that running VM ... I see only 'Auditing' which may be Navigator itself, but don't see any events from command line hadoop file access (e.g. "hadoop fs -cat /hdfs_filename")
... is Navigator actually in the quickstart VM?

I've tried to install Enterprise CDH4 on two explicitly supported platforms, CentOS 6.4 and Ubuntu 12.04, and each time it fails to get a 'heartbeat from the Agent', despite running as root and no other process using those ports note in the error (7182, 9000 and 9001).
... Is there a VM which is loaded with Navigator, or some way I can see Navigator running?

Thanks a lot -RN

Cloudera Employee
Posts: 435
Registered: ‎07-12-2013

Re: Is Navigator actually in the quickstart VM? How does it appear, as 'Audit'? No fs -cat events s

I'm afraid Navigator is not actually in the QuickStart VM. Navigator is only available in Enterprise or 60-day Trial versions of CM, and the QuickStart VM is a Free installation. I'll see about getting the website corrected. I see that erroneous information on this page, but did you see it on any other pages? https://www.cloudera.com/content/support/en/downloads.html. Thanks for letting us know about that.

 

You should be able to get it running on one of your installations without too much trouble - perhaps I can help. Can you check the logs in /var/log/cloudera-scm-agent and /var/log/cloudera-scm-server for any errors?

New Contributor
Posts: 6
Registered: ‎10-09-2013

Re: Is Navigator actually in the quickstart VM? How does it appear, as 'Audit'? No fs -cat events s

Hi, thanks for your quick reply!  Thank you in advance for any insight you have for the info' below.  I used root, gave the root password to the CM, restart CM and used host=localhost ... etc ... things collapse after about 10' with "no heartbeat from the Agent"

 

Yes, I have lots of logs, and the errors (when there are any) claim no heartbeat and 'port 9000 not free' for instance, but that port was free, ... I checked w/ netstat -plten and even manually created a listener on it with nc to make sure.

Also, on different "Retry" attempts I'd see those ports sometimes open, other times with listerners, and once that listener seemed to be from a prior Retry, and when I stopped it the next Retry succeeded in binding to it but still I got "no heartbeat" errors ... one time without any errors in the logs whatsoever.

 

Here is what I find to be relevant, below.  

I should emphasize, I've tried without firewalls to host localhost,  installing on CentOS6.4   1) a clean VM  with nothing else significant running and 2) a standalone real machine,  3) a colleague has tried also and failed,    and 4)  I've tried on Ubuntu 12.04 ... all my attempts lead to the same "no heartbeat from Agent" error.

 

I saw this on one occasion, despite the fact that these files did exist when Ichecked immediately following seeing the errors:
>>Error: could not find config file /run/cloudera-scm-agent/supervisor/supervisord.conf 
>>Error: could not find config file /run/cloudera-scm-agent/supervisor/supervisord.conf 

 

and here is a larger section of the logs...

08/Oct/2013 17:00:06 +0000] 7658 MainThread metrics INFO Importing tasktracker metric schema from file /usr/lib/cmf/agent/src/cmf/monitor/tasktracker/schema.json
[08/Oct/2013 17:00:06 +0000] 7658 MainThread dns_names INFO Using timeout of 2.000000
[08/Oct/2013 17:00:06 +0000] 7658 MainThread __init__ INFO Importing metric schema from file /usr/lib/cmf/agent/src/cmf/monitor/schema.json
[08/Oct/2013 17:00:06 +0000] 7658 MainThread agent INFO Supervised processes will add the following to their environment (in addition to the supervisor's env): {'CDH_OOZIE_HOME': '/usr/lib/oozie', 'CDH_MR2_HOME': '/usr/lib/hadoop-mapreduce', 'CDH_ZOOKEEPER_HOME': '/usr/lib/zookeeper', 'CDH_HADOOP_BIN': '/usr/bin/hadoop', 'MGMT_HOME': '/usr/share/cmf', 'CDH_IMPALA_HOME': '/usr/lib/impala', 'CLOUDERA_MYSQL_CONNECTOR_JAR': '/usr/share/java/mysql-connector-java.jar', 'CDH_YARN_HOME': '/usr/lib/hadoop-yarn', 'CMF_PACKAGE_DIR': '/usr/lib/cmf/service', 'CDH_HDFS_HOME': '/usr/lib/hadoop-hdfs', 'CDH_SOLR_HOME': '/usr/lib/solr', 'CDH_PIG_HOME': '/usr/lib/pig', 'CDH_SQOOP2_HOME': '/usr/lib/sqoop2', 'CDH_HUE_PLUGINS_HOME': '/usr/lib/hadoop', 'CM_STATUS_CODES': u'STATUS_NONE HDFS_DFS_DIR_NOT_EMPTY JOBTRACKER_IN_STANDBY_MODE', 'CDH_MR1_HOME': '/usr/lib/hadoop-0.20-mapreduce', 'CLOUDERA_ORACLE_CONNECTOR_JAR': '/usr/share/java/oracle-connector-java.jar', 'CDH_HUE_HOME': '/usr/share/hue', 'CDH_HIVE_HOME': '/usr/lib/hive', 'CDH_HTTPFS_HOME': '/usr/lib/hadoop-httpfs', 'CDH_HADOOP_HOME': '/usr/lib/hadoop', 'JSVC_HOME': '/usr/libexec/bigtop-utils', 'HIVE_DEFAULT_XML': '/etc/hive/conf.dist/hive-default.xml', 'CDH_FLUME_HOME': '/usr/lib/flume-ng', 'CLOUDERA_POSTGRESQL_JDBC_JAR': '/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar', 'CDH_HBASE_INDEXER_HOME': '/usr/lib/hbase-solr', 'TOMCAT_HOME': '/usr/lib/bigtop-tomcat', 'CDH_HBASE_HOME': '/usr/lib/hbase', 'CDH_HCAT_HOME': '/usr/lib/hcatalog'}
[08/Oct/2013 17:00:06 +0000] 7658 MainThread agent INFO To override these variables, use /etc/cloudera-scm-agent/config.ini. Environment variables for CDH locations are not used when CDH is installed from parcels.
[08/Oct/2013 17:00:06 +0000] 7658 MainThread agent INFO Created /run/cloudera-scm-agent/process

 

[08/Oct/2013 17:00:07 +0000] 7658 MainThread _cplogging INFO [08/Oct/2013:17:00:07] ENGINE Bus STARTING
[08/Oct/2013 17:00:07 +0000] 7658 MainThread _cplogging INFO [08/Oct/2013:17:00:07] ENGINE Started monitor thread '_TimeoutMonitor'.
[08/Oct/2013 17:00:12 +0000] 7658 MainThread _cplogging ERROR [08/Oct/2013:17:00:12] ENGINE Error in 'start' listener <bound method Server.start of <cherrypy._cpserver.Server object at 0x193ffd0>>
Traceback (most recent call last):
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/process/wspbus.py", line 197, in publish
output.append(listener(*args, **kwargs))
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/_cpserver.py", line 151, in start
ServerAdapter.start(self)
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/process/servers.py", line 167, in start
wait_for_free_port(*self.bind_addr)
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/process/servers.py", line 410, in wait_for_free_port
raise IOError("Port %r not free on %r" % (port, host))
IOError: Port 9000 not free on 'lykos-localhost'

[08/Oct/2013 17:00:12 +0000] 7658 MainThread _cplogging ERROR [08/Oct/2013:17:00:12] ENGINE Shutting down due to error in start listener:
Traceback (most recent call last):
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/process/wspbus.py", line 235, in start
self.publish('start')
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/process/wspbus.py", line 215, in publish
raise exc
ChannelFailures: IOError("Port 9000 not free on 'lykos-localhost'",)

[08/Oct/2013 17:00:12 +0000] 7658 MainThread _cplogging INFO [08/Oct/2013:17:00:12] ENGINE Bus STOPPING
[08/Oct/2013 17:00:12 +0000] 7658 MainThread _cplogging INFO [08/Oct/2013:17:00:12] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('lykos-localhost', 9000)) already shut down
[08/Oct/2013 17:00:12 +0000] 7658 MainThread _cplogging INFO [08/Oct/2013:17:00:12] ENGINE Stopped thread '_TimeoutMonitor'.
[08/Oct/2013 17:00:12 +0000] 7658 MainThread _cplogging INFO [08/Oct/2013:17:00:12] ENGINE Bus STOPPED
[08/Oct/2013 17:00:12 +0000] 7658 MainThread _cplogging INFO [08/Oct/2013:17:00:12] ENGINE Bus EXITING
[08/Oct/2013 17:00:12 +0000] 7658 MainThread _cplogging INFO [08/Oct/2013:17:00:12] ENGINE Bus EXITED
[08/Oct/2013 17:04:55 +0000] 8301 MainThread agent INFO No command line vars
[08/Oct/2013 17:04:55 +0000] 8301 MainThread agent INFO Missing database jar: /usr/share/java/mysql-connector-java.jar (normal, if you're not using this database type)
[08/Oct/2013 17:04:55 +0000] 8301 MainThread agent INFO Missing database jar: /usr/share/java/oracle-connector-java.jar (normal, if you're not using this database type)

 

 

Here is my full set of notes ... tedious, but for completeness to show the complexity of the process.

Tues Oct 8 2013

Installation on Ubuntu, exactly their supported version (12.04 Precise, 64-bit), fails
with the same error as on CentOS_6.4 64-bit (also exactly listed as 'supported')

The website for CM for Centos only lists one for 5.

Using their quickstart VM (CentOS 6.4 which is listed as having Navigator)

their latest CM installer
http://archive.cloudera.com/cm4/installer/latest/
started on the quickstart VM but failed:

Failed to start Embedded Service and Configuration Database, See /var/log/cloudera-manager-installer/5.start-embedded-db.log for details. Click OK to revert this installation.

 

 

Distributor ID: Ubuntu
Description: Ubuntu 12.04.3 LTS
Release: 12.04
Codename: precise

 

Popup within 30":
Refreshing repository metadata failed.
See /var/log/cloudera-manager-installer/2.refresh-repo.log for details. Click OK to revert this installation.

which has:

Hit http://archive.getdeb.net precise-getdeb/apps amd64 Packages
Hit http://archive.getdeb.net precise-getdeb/apps i386 Packages
Ign http://archive.getdeb.net precise-getdeb/apps TranslationIndex
Ign http://archive.getdeb.net precise-getdeb/apps Translation-en_US
Ign http://archive.getdeb.net precise-getdeb/apps Translation-en
Fetched 198 B in 14s (13 B/s)
W: Failed to fetch http://hudson-ci.org/debian/binary/Packages 404 Not Found

E: Some index files failed to download. They have been ignored, or old ones used instead.

The next popup says:
Unable to remove cloudera-manager-repository. It will need to be removed manually with dpkg. See /var/log/cloudera-manager-installer/3.remove-cloudera-manager-repository.log for details. Click OK to revert this installation

And the contents of that are only:
dpkg: error: dpkg status database is locked by another process

Then:
Fatal Error:
Installation failed.

 

OK, so I fix the hudson problem, their source URL is incorrect and cloudera has a hair trigger and collapses if any issue,
so now it gets further:

Your browser should now open to http://localhost:7180/. Log in to Cloudera Manager with the username and password set to 'admin' to continue installation.

but then another popup says:
System program problem detected
Do you want to report the problem now?

and I say yes,
it says Java crashed, I took screenshots,
then said OK, and get another popup:
Chromium can not be run as root
Please start Chromium as a normal user. To run as root, you must specify an alternate --user-data-dir for storage of profile information.

So now FFox opens (as root) and says it can't find the URL: http://localhost:7180/
Unable to connect, and I see there is no listener now at 7180

Firefox can't establish a connection to the server at localhost:7180.

The site could be temporarily unavailable or too busy. Try again in a few moments.
If you are unable to load any pages, check your computer's network connection.
If your computer or network is protected by a firewall or proxy, make sure that Firefox is permitted to access the Web.


The "Next step" icon, saying open a browser won't close, close button darkens but no response.
But after I close FFox, the popup does close,
and the first installation menu popup now says
Finish
Installation successful.

so I press OK.

Chrome (as me) won't open localhost:7180, no listener, and their daemon is dead:

root@rocky-T410-Ubuntu: /home/rocky/Downloads 16:38:44 => service cloudera-scm-server status
Checking for service cloudera-scm-server: * cloudera-scm-server is dead and pid file exists

So I cycle through and get it running, and see a listener now:
root@rocky-T410-Ubuntu: /home/rocky/Downloads 16:40:08 => service cloudera-scm-server stop
Stopping cloudera-scm-server: /sbin/start-stop-daemon: warning: failed to kill 5474: No such process
* cloudera-scm-server stopped
root@rocky-T410-Ubuntu: /home/rocky/Downloads 16:40:16 => service cloudera-scm-server status
Checking for service cloudera-scm-server: * cloudera-scm-server is not running
root@rocky-T410-Ubuntu: /home/rocky/Downloads 16:40:21 => service cloudera-scm-server start
Starting cloudera-scm-server: * cloudera-scm-server started

root@rocky-T410-Ubuntu: /home/rocky/Downloads 16:40:31 => service cloudera-scm-server status
Checking for service cloudera-scm-server: * cloudera-scm-server is running
root@rocky-T410-Ubuntu: /home/rocky/Downloads 16:40:35 =>

root@rocky-T410-Ubuntu: /home/rocky/Downloads 16:40:35 => netstat -ln |grep 7180
tcp 0 0 0.0.0.0:7180 0.0.0.0:* LISTEN

Now Chromium shows the login and says to restart it, and it works, but I see no sign of Navigator:
Thank you for choosing Cloudera Manager and Cloudera's Distribution Including Apache Hadoop (CDH).
This installer will enable you to later choose packages for the Services below (there may be some license implications).

Apache Hadoop (Common, HDFS, MapReduce, YARN)
Apache HBase
Apache ZooKeeper
Apache Oozie
Apache Hive
Hue (Apache licensed)
Apache Flume
Cloudera Impala (Apache licensed)
Apache Sqoop
Cloudera Search (Apache licensed)
You are using Cloudera Manager to install and configure your system. You can learn more about Cloudera Manager by clicking on the Support menu above.


And I get the hosts search wizard, which I couldn't get again after the failed installation on CentOS:
http://localhost:7180/cmf/express-wizard/hosts


OK, but the logs show errors on a /run/ file which exists (as soon as I looked):
rocky@rocky-T410-Ubuntu: ~/DG/CDH_nightmares 17:01:05 => grep Error CDH_packageUpdatingLog
>>Error: could not find config file /run/cloudera-scm-agent/supervisor/supervisord.conf
>>Error: could not find config file /run/cloudera-scm-agent/supervisor/supervisord.conf
rocky@rocky-T410-Ubuntu: ~/DG/CDH_nightmares 17:01:12 => ff /run/cloudera-scm-agent/
ls: cannot open directory /run/cloudera-scm-agent/: Permission denied
rocky@rocky-T410-Ubuntu: ~/DG/CDH_nightmares 17:01:25 => sudo ff /run/cloudera-scm-agent/
[sudo] password for rocky:
sudo: ff: command not found
rocky@rocky-T410-Ubuntu: ~/DG/CDH_nightmares 17:01:33 => sudo ls -halt /run/cloudera-scm-agent/
total 0
prw------- 1 root root 0 Oct 8 17:00 events
drwxr-x--x 3 root root 120 Oct 8 17:00 supervisor
drwxr-x--x 5 root root 120 Oct 8 17:00 .
drwxr-x--x 2 root root 40 Oct 8 17:00 process
drwxr-x--x 6 root root 120 Oct 8 17:00 cgroups
drwxr-xr-x 38 root root 1.6K Oct 8 17:00 ..
rocky@rocky-T410-Ubuntu: ~/DG/CDH_nightmares 17:01:38 => sudo ls -halt /run/cloudera-scm-agent/supervisor
total 8.0K
drwxr-x--x 3 root root 120 Oct 8 17:00 .
-rw-r--r-- 1 root root 5 Oct 8 17:00 supervisord.pid
srwx------ 1 root root 0 Oct 8 17:00 supervisord.sock
-rw------- 1 root root 904 Oct 8 17:00 supervisord.conf
drwxr-x--x 5 root root 120 Oct 8 17:00 ..
drwxr-x--x 2 root root 40 Oct 8 17:00 include
rocky@rocky-T410-Ubuntu: ~/DG/CDH_nightmares 17:02:00 =>


And I get the same Installation Failed screen w/ red writing telling me to check ports.
I retry, and have the reinstallation happen with the cycling image next to
Waiting for newly installed agent to heartbeat ...
New verb added to English, 'heartbeat' ... can't they hire an editor?

Then the same complete failure screen.
http://localhost:7180/cmf/express-wizard/wizard#step=installStep
! Uninstalled on 1 host(s) after installation failure. Retry Failed Hosts

Hostname IP Address Progress Status
localhost 127.0.0.1
Retry | Details Installation failed. Failed to receive heartbeat from agent.
Ensure that the host's hostname is configured properly.
Ensure that port 7182 is accessible on the Cloudera Manager server (check firewall rules).
Ensure that ports 9000 and 9001 are free on the host being added.
Check agent logs in /var/log/cloudera-scm-agent/ on the host being added (some of the logs can be found in the installation details).

 

I press Continue and get:
Installation did not complete on any hosts.
To retry installation, close this box and click Retry Failed Hosts.

To get technical support from Cloudera, go to: www.cloudera.com/support


I check 7182, a listener exists, I can talk to it (it's Jetty):
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:11:13 => netstat -ln |grep 7182
tcp 0 0 0.0.0.0:7182 0.0.0.0:* LISTEN
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:11:32 => echo "hi there"|nc localhost 7182
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Error 404 NOT_FOUND</title>
</head>
<body>
<h2>HTTP ERROR: 404</h2>
<p>Problem accessing there. Reason:
<pre> NOT_FOUND</pre></p>
<hr /><i><small>Powered by Jetty://</small></i>
</body>
</html>

and 9000 and 9001 are listening but not saying anything back (I'm doubtless not saying anything it wants to hear)
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:11:44 => echo "hi there"|nc localhost 9000
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:12:51 => echo "hi there"|nc localhost 9001
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:12:53 => netstat -ln |grep 900
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:9001 0.0.0.0:* LISTEN

 

 


The error log in /var/log/cloudera-scm-agent/cloudera-scm-agent.log
It claims it can't open port 9000 ...
08/Oct/2013 17:00:07 +0000] 7658 MainThread agent INFO Successfully connected to supervisor
[08/Oct/2013 17:00:07 +0000] 7658 MainThread _cplogging INFO [08/Oct/2013:17:00:07] ENGINE Bus STARTING
[08/Oct/2013 17:00:07 +0000] 7658 MainThread _cplogging INFO [08/Oct/2013:17:00:07] ENGINE Started monitor thread '_TimeoutMonitor'.
[08/Oct/2013 17:00:12 +0000] 7658 MainThread _cplogging ERROR [08/Oct/2013:17:00:12] ENGINE Error in 'start' listener <bound method Server.start of <cherrypy._cpserver.Server object at 0x193ffd0>>
Traceback (most recent call last):
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/process/wspbus.py", line 197, in publish
output.append(listener(*args, **kwargs))
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/_cpserver.py", line 151, in start
ServerAdapter.start(self)
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/process/servers.py", line 167, in start
wait_for_free_port(*self.bind_addr)
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/process/servers.py", line 410, in wait_for_free_port
raise IOError("Port %r not free on %r" % (port, host))
IOError: Port 9000 not free on 'lykos-localhost'

[08/Oct/2013 17:00:12 +0000] 7658 MainThread _cplogging ERROR [08/Oct/2013:17:00:12] ENGINE Shutting down due to error in start listener:
Traceback (most recent call last):
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/process/wspbus.py", line 235, in start
self.publish('start')
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/process/wspbus.py", line 215, in publish
raise exc
ChannelFailures: IOError("Port 9000 not free on 'lykos-localhost'",)



So, I look for and kill the process that has port 9000 and retry ...
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:35:37 => netstat -plten |head -2
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:35:50 => netstat -plten |grep 900
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 1000 1284056 16321/java
tcp 0 0 127.0.0.1:9001 0.0.0.0:* LISTEN 0 1497927 7685/python
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:35:54 => kill -9 ^C
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:36:19 => PS 16321
UID PID PPID C STIME TTY TIME CMD
rocky 16321 16304 1 15:20 ? 00:02:26 /usr/bin/java -Xms40m -Xmx384m -Dorg.eclipse.equinox.p2.reconciler.dropins.directory=/usr/share/eclipse/dropins -XX:MaxPermSize=256m -jar /usr/lib/eclipse//plugins/org.eclipse.equinox.launcher_1.2.0.dist.jar -os linux -ws gtk -arch x86_64 -showsplash -launcher /usr/lib/eclipse/eclipse -name Eclipse --launcher.library /usr/lib/eclipse//plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.1.100.dist/eclipse_1408.so -startup /usr/lib/eclipse//plugins/org.eclipse.equinox.launcher_1.2.0.dist.jar --launcher.overrideVmargs -exitdata 316000b -vm /usr/bin/java -vmargs -Xms40m -Xmx384m -Dorg.eclipse.equinox.p2.reconciler.dropins.directory=/usr/share/eclipse/dropins -XX:MaxPermSize=256m -jar /usr/lib/eclipse//plugins/org.eclipse.equinox.launcher_1.2.0.dist.jar
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:36:23 => PS 7685
UID PID PPID C STIME TTY TIME CMD
root 7685 1 0 17:00 ? 00:00:00 /usr/lib/cmf/agent/src/cmf/../../build/env/bin/python /usr/lib/cmf/agent/src/cmf/../../build/env/bin/supervisord
root 7686 7685 0 17:00 ? 00:00:00 /usr/lib/cmf/agent/build/env/bin/python /usr/lib/cmf/agent/src/cmf/supervisor_listener.py -l /var/log/cloudera-scm-agent/cmf_listener.log /run/cloudera-scm-agent/events
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:36:41 =>

 

And now I see 9000 is there again but owned by python:
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:38:05 => netstat -plten |grep 900
tcp 0 0 127.0.1.1:9000 0.0.0.0:* LISTEN 0 1537781 9627/python
tcp 0 0 127.0.0.1:9001 0.0.0.0:* LISTEN 0 1497927 7685/python
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:39:24 =>


But same Failed to receive heartbeat from agent. (Current Step)
But no errors now in the log, only a warning that timeout was set to 30, and about my laptop name:
[08/Oct/2013 17:38:53 +0000] 9627 Monitor-HostMonitor throttling_logger WARNING hostname rocky-T410-Ubuntu differs from the canonical name lykos-localhost


root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:47:45 => netstat -plten |grep 900
tcp 0 0 127.0.1.1:9000 0.0.0.0:* LISTEN 0 1537781 9627/python
tcp 0 0 127.0.0.1:9001 0.0.0.0:* LISTEN 0 1497927 7685/python
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:47:48 => netstat -plten |grep 7180
tcp 0 0 0.0.0.0:7180 0.0.0.0:* LISTEN 120 1487182 6602/java
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:47:54 => netstat -plten |grep 7182
tcp 0 0 0.0.0.0:7180 0.0.0.0:* LISTEN 120 1487182 6602/java
tcp 0 0 0.0.0.0:7182 0.0.0.0:* LISTEN 120 1484581 6602/java
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:47:56 =>


Retry fails again, with new errors now in the log /var/log/cloudera-scm-agent/cloudera-scm-agent.log
[08/Oct/2013 17:49:29 +0000] 12227 MainThread agent ERROR Heartbeating to localhost.localdomain:7182 failed.
Traceback (most recent call last):
File "/usr/lib/cmf/agent/src/cmf/agent.py", line 741, in send_heartbeat
self.master_port)
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 471, in __init__
self.conn.connect()
File "/usr/lib/python2.7/httplib.py", line 757, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno -2] Name or service not known


First I see if they've changed:
So I kill all related processes and retry: note 900x PIDs changed but not 7182 which failed to 'heartbeat':

root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:47:56 => vi /var/log/cloudera-scm-agent/*agent.*
2 files to edit
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:52:54 => netstat -plten |grep 7182
tcp 0 0 0.0.0.0:7180 0.0.0.0:* LISTEN 120 1487182 6602/java
tcp 0 0 0.0.0.0:7182 0.0.0.0:* LISTEN 120 1484581 6602/java
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:52:59 => netstat -plten |grep 900
tcp 0 0 127.0.1.1:9000 0.0.0.0:* LISTEN 0 1545186 12227/python
tcp 0 0 127.0.0.1:9001 0.0.0.0:* LISTEN 0 1545178 12249/python
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:53:02 =>

 

And that succeeds ...
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:53:02 => kill -9 12227 12249 6602
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:55:23 => netstat -plten |grep 900
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:55:26 => netstat -plten |grep 7182
root@rocky-T410-Ubuntu: /home/rocky/Downloads 17:55:30 =>

And now I can't "Retry" this popup:
Error
An error occurred. Try again.
But closing that popup shows the circling sign with "Retrying..."

No change in progress bar for 2 mins, click Abort and see same error popup then "Aborting..." and nothing...
and the Details button says 'loading' the log but nothing, and Continue does nothing.

 

 

 

Posts: 416
Topics: 51
Kudos: 89
Solutions: 49
Registered: ‎06-26-2013

Re: Is Navigator actually in the quickstart VM? How does it appear, as 'Audit'? No fs -cat events s

@rocky I have moved this post to the Cloudera Manager board in hopes that somebody in here can assist you with this agent heartbeat issue.


Regards

Highlighted
New Contributor
Posts: 6
Registered: ‎10-09-2013

Re: Is Navigator actually in the quickstart VM? How does it appear, as 'Audit'? No fs -cat events s

OK, thanks.  Will post another simple one ...

Best-Rocky