I've seen other threads on this issue, but the steps on those threads did not help me solve this issue. Thank you for the help!
The cluster was working fine for ~2 weeks. I noticed last week that the service manager was down, but ignored it as all the services were working fine. Today, I tried working with Kafka and got a java.net.ConnectionRefused exception, then noticed I couldn't start spark-shell as it said it can't connect to node2.
Under "Instances" it says I haven't received a heartbeat from node2 for 13 days, all the other nodes seem to be fine.
I can't execute "service cloudera-scm-agent start" on node2, this is the output:
/etc/init.d/cloudera-scm-agent: line 123: /var/log/cloudera-scm-agent/cloudera-scm-agent.out: Read-only file system
Starting cloudera-scm-agent: /etc/init.d/cloudera-scm-agent: line 128: /var/run/cloudera-scm-agent.pid: Read-only file system
/etc/init.d/cloudera-scm-agent: line 126: /var/log/cloudera-scm-agent/cloudera-scm-agent.out: Read-only file system
I've also just tried restarting the cluster. Stumped on this one :/. Thanks for the help!
"Read-only file system" implies that your disks are not writable. See if you can touch a file in /var/log and if you get the same error, OS/disk troubleshooting is a good start.