Created on 09-26-2018 09:51 PM - edited 08-17-2019 11:40 PM
Current issue I'm having:
* Cloudbreak deployment server stops serving the web UI and becomes unusable after 30-90 minutes of use.
* Restarting cbd ("cbd restart") at the shell prompt may bring the GUI back up some of the time, but the username/password doesn't work, and when I use linux to reset the passwd, I can login to the GUI, but it has lost all of my configurations and clusters and gives an error indicating that the UI cannot connect to CloudBreak.
* I have now scrapped it and started over 3 times (generating 3 new CloudBreak deployment VMs) and had the same result every time.
More details and background:
I downloaded and installed CloudBreak 2.7.1 for Centos7. I used it to generate a cbd-deployment virtual machine (in GCP), following the QuickStart (https://docs.hortonworks.com/HDPDocuments/Cloudbreak/Cloudbreak-2.7.1/content/gcp-quick/index.html#g...).
It successfully ran and spit out a new public IP address for me to log into via https://THE_IP_ADDRESS. I navigated into the Cloudbreak GUI and logged in with my admin user auth. I went through the enitre process of setting up certs and deploying a cluster according to one of the blueprints. Everything worked wonderfully!! So excited!
Then, a half hour later when I tried using the cloudbreak deployment gui again, the server was no longer responding. I used ssh to connect into the linux shell and see what was going on. I found a couple of topics here in the HCC forum, that gave hints for troubleshooting the issue. Here are the two that seemed promising:
I tried a few things from the first article:
cd /var/lib/cloudbreak-deployment cbd ps cbd start
This didn't solve the problem. Reading further, I tried:
cbd restart
This enabled the GUI again, but bad user/passwd auth. I used the linux "passwd" command to reset the password for my admin user, and once again I could get past the login screen. But, as noted above, an error popped up stating that it could not connect to CloudBreak. A red-badge at the upper-right side of the GUI highlighted [CloudBreak 0] as being in a failed state. All of the GUI screens (clusters, blueprints, credentials) showed up blank (no data). All is lost!!! 😉
Not knowing what else to do, I started over. I have checked that SELinux and firewalld are disabled on the VM that cloudbreak generated for me. Initially, SELinux is disabled but firewalld was active (in the generated vm). So, I disabled and stopped the firewalld service and tried to "cbd restart". Now (on my 3rd dead cloudbreak-deploy vm box), I can't seem to revive even the GUI to the point where it will let me login.
Is the deployment vm generated by Cloudbreak 2.7.1 this unstable for everyone? Is Centos7 a bad mix with this version? Any suggestions please! I would love to use Cloudbreak, but I'm losing a little confidence in it's ability to not lose all my data and crash. Lol.
Thanks in advance!
-Phil
Created 09-27-2018 08:02 AM
Could you please attach the output of the following to the case:
cd /var/lib/cloudbreak-deployment cbd ps cbd create-bundle
That will contain all the logs necessary to find out what happened, without any sensitive info in them.
Hope this helps!
Created 09-27-2018 02:40 PM
Exactly same problem here. I actually follows the instruction here. I tried multiple times.
Created 10-12-2018 10:01 PM
We are running into the same issue. Also working with support ( they have been awesome). interestingly enough upgrading to 2.7.2 didn't fix the issue.
--joe
Created 10-26-2018 01:05 PM
@Phil Scott @Joe Diolosa @andrew chen
Sorry for the late response, your observation was right, there was a remote update by Google in the launched instances which resulted in Cloudbreak stopping after around one hour.
The fix is already merged.
Could you please try out 2.7.3-rc.4 version, which already contains the fix?
Hope this helps & sorry for the inconvinience.
Created 11-08-2018 08:45 AM
@Phil Scott@Joe Diolosa@andrew chen
Unfortunately there was an unrelated issue in our 2.7.3.-rc.4 build causing an error, which we have corrected in 2.7.3-rc.21, so that is expected to work.
Sorry about the inconvenience!