Created on 01-11-2018 03:57 AM - edited 09-16-2022 05:43 AM
I am trying to configure a pseudo cluster with both CM 5.13 and CHD 5.13 installed in the cluster nodes (docker containers), but am having some problems getting the Hive service work.
Specifically the issue I have is that the Hive Metastore is not using my local MySQL database, but is instead defaulting to a non existent Postgresql database .
Here are the steps I took:
1) I configured Mysql server and a database for the metastore by following the official steps here:
And as described in the last step, I added the following properties to /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/hive/conf/hive-site.xml:
(...) xml that was already there (...) <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost/metastore</value> <description>the URL of the MySQL database</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>mypassword</value> </property> <property> <name>datanucleus.autoCreateSchema</name> <value>false</value> </property> <property> <name>datanucleus.fixedDatastore</name> <value>true</value> </property> <property> <name>datanucleus.autoStartMechanism</name> <value>SchemaTable</value> </property> <property> <name>hive.metastore.schema.verification</name> <value>true</value> </property>
And finally I restarted the CMS.
2) After this I tried to start the Hive Metastore from the CM but it did not work, and after inspecting the stderr log I found the following error:
+ exec /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/hive/bin/hive --config /var/run/cloudera-scm-agent/process/188-hive-HIVEMETASTORE --service metastore -p 9083~ javax.jdo.JDOException: Exception thrown when executing query at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596) .... org.postgresql.util.PSQLException: ERROR: column A0.SCHEMA_VERSION_V2 does not exist
And indeed, after checking
/var/run/cloudera-scm-agent/process/188-hive-HIVEMETASTORE/hive-site.xml
The first line of the file was:
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:postgresql://headnode1.cluster:7432/hive</value> </property>
The error "column A0.SCHEMA_VERSION_V2 does not exist", from what I could tell, is related to the metastore database schema not being correct, but in my Mysql metastore database I used:
SOURCE /usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-1.1.0.mysql.sql;
So I believe the problem here is the fact that the CSM is ignoring the config file in /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/hive/conf/hive-site.xml
Any ideas on this issue?
Thank you very much.
Created 01-11-2018 01:05 PM
Hello @mm25,
/opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/hive/conf/hive-site.xml is merely a stock file that is not used to start a service. When you manage a cluster with Cloudera Manager, Cloudera Manager will initiate the start of the service. It will use what you have configured in Cloudera Manager to generate all files necessary to start the process. This includes hive-site.xml.
So, in short, the answer to why Hive is trying to use postgres is that you have it configured to do so in the configuration stored in Cloudera Manager. To affect change you will need to change the configuration within Cloudera Manager's configuration.
- go to Clusters --> Hive
- Click on Configuration (subtab)
This will display your configuration.
- On the left side, Click on Hive Metastore Database to view all database-related configuraiton options.
- You can update the configuration and Save/Restart Hive to have the changes take effect.
If you want to view the generated configuration files, you can view them in a Role's Processes subtab or view them at the commandline in /var/run/cloudera-scm-agent/process
Each time a process is started by Cloudera Manager, a new process directory is created under that dir.
I generally use ls -lrt |grep hive or something of the sort to see the most recent process directory created for a service's roles.
Hope that does the trick!
Created 01-11-2018 01:05 PM
Hello @mm25,
/opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/hive/conf/hive-site.xml is merely a stock file that is not used to start a service. When you manage a cluster with Cloudera Manager, Cloudera Manager will initiate the start of the service. It will use what you have configured in Cloudera Manager to generate all files necessary to start the process. This includes hive-site.xml.
So, in short, the answer to why Hive is trying to use postgres is that you have it configured to do so in the configuration stored in Cloudera Manager. To affect change you will need to change the configuration within Cloudera Manager's configuration.
- go to Clusters --> Hive
- Click on Configuration (subtab)
This will display your configuration.
- On the left side, Click on Hive Metastore Database to view all database-related configuraiton options.
- You can update the configuration and Save/Restart Hive to have the changes take effect.
If you want to view the generated configuration files, you can view them in a Role's Processes subtab or view them at the commandline in /var/run/cloudera-scm-agent/process
Each time a process is started by Cloudera Manager, a new process directory is created under that dir.
I generally use ls -lrt |grep hive or something of the sort to see the most recent process directory created for a service's roles.
Hope that does the trick!
Created 01-17-2018 09:59 AM
Hi @bgooley,
Sorry about the late reply. But yup, that was exactly the issue, once I inserted the right configurations in Cloudera Manager it connected to the Mysql server correctly
Thank you for the help!
Created 01-17-2018 10:01 AM
Awesome! Thank you for following up with the good news!