Support Questions

thomas_chaniac · ‎02-23-2018

Hello there,

We are facing a big problem in our company :

Most of the /var directory was accidentally dropped after a bad log clean in Ambari-Server host.

In the removed files : .map files used to map content in PGSQL. The consequence is that the Ambari-Server can't start anymore. The databases are here but we get "could not find relation mapping" error, so we can't acces it well.

The problem : No DB Backup. No Snaphots. Overall, no directory backup. (Yes i know, this is very bad for a Production cluster...)

The Ambari Cluster is just over since yesterday.

I saw that thread today, but i'm not sure it will be the best issue for us :

https://community.hortonworks.com/questions/6703/ambari-server-installation-after-cluster-setup.html

Infos :

HDP 2.6.1 / Ambari 2.5.1 / PGSQL 9.5

1 Gateway with Ambari-Server (the impacted server) / 2 Masters / 4 Slaves

Our goal is to save the data on the Datanodes and get an Ambari Cluster back.

Any ideas or similar past experiences ?

Thanks all

EDIT 1 : Impacted databases on the host are Ambari / Hive / Oozie / Hue

JordanMoore · ‎02-24-2018

@Tom C

That's just a warning that you have existing processes on the machines. If you let it uninstall packages or delete user accounts, you'll have downtime on the cluster, and services might not stop gracefully, so you risk additional corruption.

I've added machines like this that are provisioned by Puppet, and so there are some extra background services running, but I just ignore that warning, and Ambari has set them up fine.

Regarding the Hive Metastore, if you have set it up to use an external Postgres/MySQL database (recommended), I would probably first let Ambari first install the embedded Derby database for Hive, then manually edit the hive-site XML to point to the old one.

View solution in original post

JordanMoore · ‎02-24-2018

Ambari doesn't control any data residing on the datanodes, so you should be safe there.

What I would do is let all the Hadoop components remain running "in the dark", by stopping all ambari-agents in the cluster, maybe even uninstalling it.

Then, install and setup a new Ambari server, add a cluster, but register no hosts.

Configure each of stopped ambari agents to point at the new Ambari server address, and start them.

Add the hosts in the Ambari server UI, selecting "manual registration" option at the bottom of the dialog. Hopefully all the hosts register successfully. After which, you are given the option of installing clients and servers.

Now, you could try to "reinstall" what is already there, but you might want to deselect all the servers on the datanode column. In theory, it will try to perform OS package installation, and say that the service already exists, and doesn't error out. If it does error, then restart the install process, but deselect everything -- At which point, it should continue and now you have Ambari back up and running with all the hosts monitored, just with no processes to configure.

To add the services back, you would need to use the Ambari REST API to add back in the respective Services, Components, and Host Components that you have running on the cluster. If you can't remember what those are from all the things you have the options of installing in HDP, then go to each host and do a process check to see what's running.

thomas_chaniac · ‎02-24-2018

Thanks for the advice, the approach seems interesting.

I guess there is no need to totally uninstall the agents, just to change the Ambari-Server host in the ambari-agent.ini file.

I tried to add an existing host to the new Amari-Server with Manual Registration. The host can be added, but i got some warnings after the Check for the already installed packages and created users on the host (by the old installation). I didn't finalize the Add Host process yet, because i wanted to have feedbacks first. I don't know if it is better to uninstall all the existing packages/users or not.

In your opinion, is it better to just add hosts with no services configured in Ambari first and add them step by step after ?

The other question is about the Hive Metastore. Can i create a new one similar to the old one with all the existing data without troubles ?

JordanMoore · ‎02-24-2018

@Tom C

That's just a warning that you have existing processes on the machines. If you let it uninstall packages or delete user accounts, you'll have downtime on the cluster, and services might not stop gracefully, so you risk additional corruption.

I've added machines like this that are provisioned by Puppet, and so there are some extra background services running, but I just ignore that warning, and Ambari has set them up fine.

Regarding the Hive Metastore, if you have set it up to use an external Postgres/MySQL database (recommended), I would probably first let Ambari first install the embedded Derby database for Hive, then manually edit the hive-site XML to point to the old one.

thomas_chaniac · ‎02-24-2018

@Jordan Moore Thanks again for the advice. We will try this solution to rebuild a new Ambari-Server without data loss.

For the Hive Metastore, the problem is the same as Ambari-Server : the database is corrupted (there were Ambari, Hive, Hue and Oozie databases on the impacted host). We will have to build a new one i guess.

TimothySpann · ‎02-24-2018

once it is working backup everything. Make sure every service is HA.

thomas_chaniac · ‎02-24-2018

@Timothy Spann Yes, it's planned ! Jordan's solution seems interesting to you ?

TimothySpann · ‎02-24-2018

yes it does

thomas_chaniac · ‎02-28-2018

Here are some news @Jordan Moore @Timothy Spann :

Adding existing hosts with a new Ambari-Server worked well ! We got all the HDFS data back.

We added the same roles present before the crash, configured PGSQL for Hive and Oozie.

We got some missing blocks in /ats/done/... due to the stop/start of HDFS services but it's not critical and it's only 74 / 570.000 blocks.

We are now working on Ambari configuration and Hive Metastore.

And of course, backups are coming !

Thanks for your help.

mpaul · ‎03-02-2018

Hi,

But what about the service configuration? Even when you re-add your host Ambari won't recognize the old service configuration. This has to be done when "reinstalling" the cluster ?

Manfred

Cloudera Community

Support Questions

Install a new Ambari-Server on existing HDP stack without data loss