After upgrading from Altus Director 2.8 to 6.2, deployments to Azure using Azure Postgresql for the external DB fail due to Director appending multiple instances of the DB hostname to the DB usernames defined in the cluster bootstrap file.
I am deploying CDH v5.16.1 and using the same cluster definition that I used with Altus 2.8 to successfully deploy several clusters which use Azure Postgresql as the external DB.
As an example of the issue I'm seeing, in the cluster bootstrap I have CDH DB usernames defines as:
However, upon initial launch of the bootstrap job, cloudera-scm-server fails to connect to the DB instance. In the /etc/cloudera-scm-server/db.properties I see that the DB username is defined as:
It appears as if some rogue logic in the Director deployment script is looping over the DB parameters and appending the DB host to the username 2 additional times instead of simply reading in the username as defined in the bootstrap file.
If I update this file to reflect the correct username and restart, cloudera-scm-server is able to successfully connect.
During firstrun of the cluster, Hive, Oozie and Hue fail to start for the same reason, however this time when I update the server configuration with the correct DB username, the service fails to start and throws an error warning that usernames should contain only alphanemeric and underscore characters. I assumed it was complaining about the "@". When I removed the "@hostname" and left only the base name (i.e. cdh_hive) and tried to start the service, it failed with an error indicating that the username has to be in the format of user@dbhostname - which is what I tried in the first place. Something seems fundamentally broken.
Error before change:
Caused by: java.lang.IllegalArgumentException: Name must contain only alphanumeric characters and underscores: 'cdh_hive@ca2a0jd1-postgres-70d84636'
Error after change:
Caused by: org.postgresql.util.PSQLException: FATAL: Invalid Username specified. Please check the Username and retry connection. The Username should be in <username@hostname> format.
The above attempted workarounds are hacks and I shouldn't have to do it. Director should deploy the cluster using the DB username parameters defined in the cluster bootstrap file. I didn't have this issue with Altus Director 2.8 when deploying to Azure, nor do I have this problem with 6.2 deploying in AWS. In the CDH documentation for Azure MySQL it is specifically mentioned that you should define the DB usernames as "username@host" however no mention of this is made of Azure Postgresql, which has the same requirement. The leads me to believe that Director 6.2 is making some incorrect assumptions WRT parsing DB parameters in preparation for deployments with Azure Postgresql backends.
Thanks Mike. Here's some additional info in case is helps...
As I mentioned above, I used the hack around to get cluster bootstrap to continue. Given that Hive, Ooozie, and Hue fail to start during firstrun (due to corruption of DB username defined in cluster bootstrap file), I remove the services then try to add manually using Cloudera Manager, starting with Hive. The DB connection check performed during the add service wizard succeeds, however the service fails to start with the error below:
+ [[ create_metastore_tables == create_metastore_tables ]] + '[' 5 -ge 5 ']' + skip_if_tables_exist metastore_db_py.properties + '[' -z metastore_db_py.properties ']' + DB_INFO_FILENAME=metastore_db_py.properties + JDBC_JARS=/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-42.1.4.jre7.jar:/usr/share/java/oracle-connector-java.jar + [[ -z '' ]] + JDBC_JARS_CLASSPATH='/usr/share/cmf/lib/*:/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-42.1.4.jre7.jar:/usr/share/java/oracle-connector-java.jar' ++ /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -cp '/usr/share/cmf/lib/*:/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-42.1.4.jre7.jar:/usr/share/java/oracle-connector-java.jar' com.cloudera.cmf.service.hive.HiveMetastoreDbUtil /run/cloudera-scm-agent/process/181-hive-metastore-create-tables/metastore_db_py.properties unused --printTableCount Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: Name must contain only alphanumeric characters and underscores: 'cdh_hive@ca2a0jd1-postgres-70d84636'
It would seem to be complaining about the username, the but @ is required so perhaps the mention of username in the error is a red herring. If this looks like an unrelated bug, I can submit in a separate post.
Thanks for the update Mike. That was my guess as well based on some clues in your documentation for Altus 6.2 / Azure. I'm actually in the middle of preparing to test the deployment with Azure MySQL, but would much prefer to stick with Postgres if it looks like a quick fix.
I deployed Azure MySQL 5.7 instance, prepared the DB and updated my cluster bootstrap file to reflect use of mysql instead of postgres. Hitting what seems to be the exact same issue. During bootstrap, in the cloudera-scm-server.log:
ava.sql.SQLException: Access denied for user 'cdh_manager@ca2143d1-mysql-79d7fe2b@ca2143d1-mys'@'10.0.10.5' (using password: YES) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1078) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4187) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4119) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:927) at com.mysql.jdbc.MysqlIO.proceedHandshakeWithPluggableAuthentication(MysqlIO.java:1709) at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1252) at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2488) at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2521) at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2306) at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:839) at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:49) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:421) at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:350) at com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:135) at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:182) at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:171) at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:137) at com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1014) at com.mchange.v2.resourcepool.BasicResourcePool.access$800(BasicResourcePool.java:32) at com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask.run(BasicResourcePool.java:1810) at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:547)
/etc/cloudera-scm-server/db.properties file has corrupt username again:
# # This file is managed by Cloudera Altus Director and may be updated automatically # Generated during cluster bootstrap at 2019-05-15T20:10:18.131Z # com.cloudera.cmf.db.type=mysql com.cloudera.cmf.db.host=ca2143d1-mysql-79d7fe2b.mysql.database.azure.com:3306 com.cloudera.cmf.db.name=cdh_manager com.cloudera.cmf.db.user=cdh_manager@ca2143d1-mysql-79d7fe2b@ca2143d1-mysql-79d7fe2b@ca2143d1-mysql-79d7fe2b com.cloudera.cmf.db.password=****************
Note that the username is defined in cluster bootstrap as:
I'm glad you asked, actually. MySQL seems to work as I'd expect after a fix. I'm having some difficulty replicating the issue in Postgres, though, because I can't seem to progress past a certain point. Is there anything special you have to do to your Postgres instance to get by this error?
ERROR: admin-user@host must be member of role scm_xxxxx
That errors seems to indicate that the admin user wasn't added as member of your CDH user role. We do this as part of the DB prep prior to inititiating cluster deployment. In our case, postgres is our admin user:
CREATE ROLE cdh_manager LOGIN PASSWORD '**********'; GRANT cdh_manager TO postgres; CREATE DATABASE cdh_manager OWNER cdh_manager ENCODING 'UTF8';
We repeat this for every CDH DB user we create.
Thank you -- clearly I'm not as familiar with Postgres. :-) I'll keep you informed.