Support Questions
Find answers, ask questions, and share your expertise

Impala Catalogue server down after upgrading from 5.11.0 to 5.11.1

Expert Contributor

We have upgraded to 5.11.1 and now we are not able to run any impala queries. 

 

Error: 

Query: show databases

ERROR: AnalysisException: This Impala daemon is not ready to accept user requests. Status: Waiting for catalog update from the StateStore.

 

Statestore logs: 

 

I0706 12:54:32.296458 28189 authentication.cc:427] Successfully authenticated principal impala/cba24uu.abc.cdb.com@ABC.CDB.COM on an internal connection

I0706 12:54:32.296932 28401 statestore.cc:381] Registering: catalog-server@cba24uu:26000

I0706 12:54:32.297024 28401 statestore.cc:404] Subscriber 'catalog-server@cba24uu:26000' registered (registration id: 16404957b6105e9d:7340f75c059dbe95)

I0706 12:54:32.310817 28156 status.cc:114] Couldn't open transport for cba24uu:23020 (authorize: cannot authorize peer)

    @           0x8394e9  (unknown)

    @           0xdac876  (unknown)

    @           0xdacb92  (unknown)

    @           0xa505ab  (unknown)

    @           0xa50b83  (unknown)

    @           0xb36d62  (unknown)

    @           0xb39c4e  (unknown)

    @           0xb400b6  (unknown)

    @           0xbdcd09  (unknown)

    @           0xbdd6e4  (unknown)

    @           0xe2717a  (unknown)

    @     0x2b5ed7b36aa1  start_thread

    @     0x2b5ed7e3493d  clone

I0706 12:54:32.310847 28156 thrift-client.cc:67] Unable to connect to cba24uu:23020

I0706 12:54:32.310878 28156 statestore.cc:696] Unable to send heartbeat message to subscriber catalog-server@dig24au:26000, received error: Couldn't open transport for cba24uu:23020 (authorize: cannot authorize peer)

I0706 12:54:32.316840 28144 status.cc:114] Couldn't open transport for cba24uu:23020 (authorize: cannot authorize peer)

 

If i try to telnet to host and port it works. 

 

Catalogue logs:

 

I0707 09:37:11.706931 17577 thrift-server.cc:391] Command '/var/run/cloudera-scm-agent/process/2951-impala-CATALOGSERVER/altscript.sh sec-0-ssl_private_key_password_cmd' executed successfully, .PEM password retrieved
I0707 09:37:11.713904 17577 thrift-server.cc:449] ThriftServer 'StatestoreSubscriber' started on port: 23020s
I0707 09:37:11.714009 17577 statestore-subscriber.cc:203] Registering with statestore
I0707 09:37:11.801826 17577 statestore-subscriber.cc:169] Subscriber registration ID: 664bb584455ec4bf:a5fd7f54e1e7009f
I0707 09:37:11.801847 17577 statestore-subscriber.cc:207] statestore registration successful
I0707 09:37:11.803041 17577 catalogd-main.cc:91] Enabling SSL for CatalogService
I0707 09:37:11.830278 18039 thrift-util.cc:111] TAcceptQueueServer: Caught TException: No more data to read.
I0707 09:37:11.830605 17997 HdfsTable.java:1105] Fetched partition metadata from the Metastore: mssql_polybase.sample_data
I0707 09:37:11.833709 18039 thrift-util.cc:111] TAcceptQueueServer: Caught TException: No more data to read.
I0707 09:37:12.144228 17577 thrift-server.cc:391] Command '/var/run/cloudera-scm-agent/process/2951-impala-CATALOGSERVER/altscript.sh sec-0-ssl_private_key_password_cmd' executed successfully, .PEM password retrieved

I0707 09:37:12.151124 17577 thrift-server.cc:449] ThriftServer 'CatalogService' started on port: 26000s
I0707 09:37:12.151144 17577 catalogd-main.cc:96] CatalogService started on port: 26000
I0707 09:37:12.232126 17997 TableLoader.java:97] Loaded metadata for: mssql_polybase.sample_data
I0707 09:37:12.846177 18039 thrift-util.cc:111] TAcceptQueueServer: Caught TException: No more data to read.
I0707 09:37:13.858829 18039 thrift-util.cc:111] TAcceptQueueServer: Caught TException: No more data to read.
I0707 09:37:14.869678 18039 thrift-util.cc:111] TAcceptQueueServer: Caught TException: No more data to read.

1 ACCEPTED SOLUTION

Expert Contributor

This issue is resolved after adding the hostname flag and restarted the cluster. 

thank you guys. 

View solution in original post

16 REPLIES 16

New Contributor

We are facing the same issue. Any help?

 

Expert Contributor

We currently identified this issue with the impala certificate. we are now looking into it. 

 

1. Check cert

openssl s_client -connect $hostname:$port -CAfile /abc/hadoop/cloudera-certs/impala-SAN.pem

 

2. Run hostname -f (This must give you the FQDN)

Champion
I think a new "feature" was added in this latest release. We hit the same issue. We have SSL enabled for Impala with Kerberos. SSL worked for other services like the UIs and Impalad subscribing to the statestore, but catalogd continued to fail to subscribe to the statestore with the same error.

Cloudera Support kindly pointed out that it wasn't trying to communicate using the FQDN; just the hostname. They provided this information. We applied the change and Impala is operational again.

Cloudera Manager > Impala > Configurations>

For Catalog > Catalog Server Command Line Argument Advanced Configuration Snippet (Safety Valve)

For StateStore > Statestore Command Line Argument Advanced Configuration Snippet (Safety Valve)

[configure]
--hostname=hostname.example.com

Champion
I did this test and I was able to connect to both the statestore and catalogd over SSL, but this was because I was using the FQDN (hostname -f). The issue is that CatalogD and the Statestore are using the short name post upgrade for the statestore subscription. This feels like a bug was introduced or possible this was the intended behavior and it was "fixed", but now you need this configuration setting to get SSL for Impala to work.

Cloudera please fix the code or update the Impala SSL docs to reflect the need for this setting.

Expert Contributor

@mbigelow - Thank you for keeping the JIRA updated - I'm glad you found the solution through support. It looks like you are hitting a bug in CM and we are working on fixing it. I will reach out to our documentation team to point out this issue in the docs and the release notes of 5.11.1. I'm sorry for the troubles this has caused you.

Expert Contributor

After more investigation I found that this is already documented as a Known Issue in CM: Known Issues and Workarounds in Cloudera Manager 5

 

For Impala I opened IMPALA-5631 to explain the problem and possible solutions in the docs.

Champion
@Lars Volker Thanks for adding this bit of info. I was looking at IMPALA-5631 as a suspect but never thought to look at CM.

Lesson Learned: pay as much attention to CM release notes as I do CDH release notes.

Champion

@desind  Could you let me know the details of the Operating system things like Version  , name  kernel version . 

Curious to know . 

Expert Contributor

After looking at https://www.cloudera.com/documentation/enterprise/release-notes/topics/cm_rn_known_issues.html#conce...

 

To workaround this issue, upgrade to one of the following versions of Cloudera Manager before upgrading CDH:

  • 5.10.2
  • 5.8.6

It does not mention 5.11.1 . so does this issue surface when using CM 5.11.1 ? 

 

Expert Contributor

Ok thanks for confirming. 

 

Once we made that change and restarted impala we saw another issue.

We added --hostname=cba24uu.abc.cdb.com to the below two settings and restarted impala. 

 

Catalog Server Command Line Argument Advanced Configuration Snippet (Safety Valve)

Statestore Command Line Argument Advanced Configuration Snippet (Safety Valve) -- 

 

Error after change:

 

Statestore Logs 

 

0707 09:15:34.784071 6030 statestore.cc:696] Unable to send topic update message to subscriber catalog-server@cba24uu.abc.cdb.com:26000, received error: Unexpected registration ID: 744638b525bdc432:7fe0e50a41d1a684, was expecting 6f42d3d3ce4b50ec:b68d9b0341657791

 

 

Expert Contributor

Full log file :

 

I0707 09:15:32.058609 5861 logging-support.cc:294] Old log file deleted during log rotation: /var/log/statestore/statestored.cba24uu.impala.log.ERROR.20170621-100605.21240
I0707 09:15:34.784071 6030 statestore.cc:696] Unable to send topic update message to subscriber catalog-server@cba24uu.abc.cdb.com:26000, received error:
Unexpected registration ID: 744638b525bdc432:7fe0e50a41d1a684, was expecting 6f42d3d3ce4b50ec:b68d9b0341657791
I0707 09:15:36.729167 6596 statestore.cc:381] Registering: catalog-server@cba24uu.abc.cdb.com:26000
I0707 09:15:36.730576 6596 statestore.cc:404] Subscriber 'catalog-server@cba24uu.abc.cdb.com:26000' registered (registration id: dd4a22df064b0c6f:2942c05b6aa152a3)
I0707 09:15:36.730842 6029 client-cache.h:260] client 0x4a05000 unexpected exception: TTransportException: Transport not open, type=N6apache6thrift9transport19TTransportExceptionE
I0707 09:15:36.730855 6029 client-cache.cc:81] ReopenClient(): re-creating client for cba24uu.abc.cdb.com:23020
I0707 09:15:36.730857 6042 client-cache.h:260] client 0x4a05140 unexpected exception: TTransportException: Transport not open, type=N6apache6thrift9transport19TTransportExceptionE
I0707 09:15:36.730866 6042 client-cache.cc:81] ReopenClient(): re-creating client for cba24uu.abc.cdb.com:23020
I0707 09:15:36.767303 6029 statestore.cc:696] Unable to send topic update message to subscriber catalog-server@cba24uu.abc.cdb.com:26000, received error:
Unexpected registration ID: dd4a22df064b0c6f:2942c05b6aa152a3, was expecting 6f42d3d3ce4b50ec:b68d9b0341657791
I0707 09:15:38.785959 6031 statestore.cc:696] Unable to send topic update message to subscriber catalog-server@cba24uu.abc.cdb.com:26000, received error

Expert Contributor

This issue is resolved after adding the hostname flag and restarted the cluster. 

thank you guys. 

Champion

@desindthanks for the version information mate . appreciated it

Expert Contributor

@csguna

 

2.6.32-573.22.1.el6.x86_64

Redhat 6.7

Champion
@desind

No. It was kind of always there but the Impala JIRA mentioned above that fixed a issue caused it to pop up. Those CAM releases have the CM fix to have it communicate with the FQDN. For whatever reason it didn't make it into CM 5.11.1. It should be in a CM 5.11 release in the future.

In short either upgrade to one of those CM versions or use the --hostname setting for CM/CDH 5.11.1.

Champion
@desind

No. It was kind of always there but the Impala JIRA mentioned above that fixed a issue caused it to pop up. Those CAM releases have the CM fix to have it communicate with the FQDN. For whatever reason it didn't make it into CM 5.11.1. It should be in a CM 5.11 release in the future.

In short either upgrade to one of those CM versions or use the --hostname setting for CM/CDH 5.11.1.
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.