Hi folks. Hope you can help. While using Cloudera with Impala I ran into something weird for the new year. The Virtual machines we were running Cloudera/impala off of hada shutdown due to a power issue. When restart the machine and the virtual servers we got the following critical error on the Cloudera Manager:
"The Impala StateStore is not running."
Which SOUNDS straight forward but...when I click the
I get a "green light" status on the Impala StateStore service with the message:
"This role's status is as expected. The role is starting."
Now it doesn't say "started" but "starting" where is where I suspect the problem is. I've looked at the logs and they appear to be in order. I see an E0103 as a single entry every time I restart the cloudera service or reboot the service but no other message. (cannot find what E0103 means). I checked for the daemon status using
netstat -lpnt | grep statestore
and find instances running on ports 24000 and 25010 and the iptables is disabled on the server.
I could disable the error but i feel that isn't right. I can do queries on on the Impala Hue interface with no issue. Can anyone tell me what is going on and how to clear the error (the status or the cause) in a proper manner. Do I actually have a problem. Everything was great until the irregular shutdown due to power outage. but that shouldn't cause this if the files are intact. can add additional error logs if needed but would like to know what I should be looking forward. I can't resteart the storestate impala service manaully as it appears to be tied into the Cloudera manager.
This sort of issue can happen from time to time. Cloudera engineering is working to help guard against this in future releases.
For now, if you have a role stuck in the states STARTING or STOPPING, here are the manual steps to correct that.
The following steps require updating the Cloudera Manager database. Make sure you have a backup just in case something goes wrong.
Determine the true role state by querying the supervisord on the agent where the role runs.
To do so, we can use a client of supervisord called supervisorctl.
On the host where the role is listed as STARTING or STOPPING, run:
/usr/lib64/cmf/agent/build/env/bin/supervisorctl -c /var/run/cloudera-scm-agent/supervisor/supervisord.conf
The command will return the states of all the processes that supervisord knows about. Find the StateStore process and note its status. For example, the output may look like this:
251-impala-STATESTORE RUNNING pid 826, uptime 1:48:31
The second column is the true status of the process. Note it.
Type "quit" and hit enter to exit the supverisorctl interactive tool.
Return the role's row in the Cloudera Manager database.
NOTE: If you don't know how to connect to the CM database, look in /etc/cloudera-scm-server/db.properties for the connection settings.
Search for the role that is stuck in STARTING with the following query:
select role_id, name, configured_status from ROLES where configured_status = "STARTING";
Note the role_id and set it to either RUNNING or STOPPED depending on the supervisor status that was observed for that role in step 1 above.
Update the rows in via SQL:
If the role was running,
Update the row with a configured_status of RUNNING:
update ROLES set configured_status = "RUNNING" where role_id = <role_id from step 2>;
If the role was not running when you checked with supervisord, set configured_status to STOPPED:
update ROLES set configured_status = "STOPPED" where role_id = <role_id from step 2>;
Restart Cloudera Manager
service cloudera-scm-server restart
This should correct the issue, but please let us know if you have any trouble or questions.
Thanks for the quick response. Here is the output for Impala statestore:
2918-impala-STATESTORE RUNNING pid 3091, uptime 1:15:50