About ScottChris

ScottChris · ‎12-08-2016

increased catalog server heap resolved this problem.

ScottChris · ‎12-05-2016

found an out of memory heap error in catalogd. which is weird as we've been restarting the individual impala damon with 'in flight' queries stuck to recover from this problem, not the catalog server. cloudera agent restarts it automatically? heap has been increased 4x. so we are in waiting mode to see if this resolves it. thanks.

ScottChris · ‎12-05-2016

Version: Cloudera Express 5.8.2 (#17 built by jenkins on 20160916-1426 git: d23c620f3a3bbd85d8511d6ebba49beaaab14b75) Parcel Name Version Status Actions CDH 5 5.8.2-1.cdh5.8.2.p0.3 Distributed, Activated $ uname -a Linux hostname_redacted 2.6.32-642.6.2.el6.x86_64 #1 SMP Mon Oct 24 10:22:33 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux We initially thought we were exceeding impala-shell resources with our insert-select statement moving external csv data to an internal parquet table, however now a simple 'compute incremental stats tablename' has become stuck as well. This is causing us grief in our production environment, and we are having to constantly check port 25000, and manually restart the particular impala damon spinning the cpu. Luckily our insert scripts are fault tolerant and just repeat if fail. (but once all CPUs are consumed spinning then we are dead in the water) We are not sure but this seems to have started after we upgrade 5.71. to 5.8.2. In the logs immediately after the 'stuck' query is always this error: I1204 03:30:03.958894 7150 Frontend.java:875] analyze query compute incremental stats tablename I1204 03:30:03.959247 7150 Frontend.java:819] Requesting prioritized load of table(s): default.tablename I1204 03:32:03.970648 7150 Frontend.java:894] Missing tables were not received in 120000ms. Load request will be retried. I1204 03:32:03.970940 7150 Frontend.java:819] Requesting prioritized load of table(s): default.tablename I1204 03:32:37.981461 7142 jni-util.cc:166] com.cloudera.impala.catalog.CatalogException: Detected catalog service ID change. Aborting updateCatalog() at com.cloudera.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:120) at com.cloudera.impala.service.Frontend.updateCatalogCache(Frontend.java:227) at com.cloudera.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:180) I1204 03:32:37.983515 7142 status.cc:111] CatalogException: Detected catalog service ID change. Aborting updateCatalog() @ 0x80f2c9 (unknown) @ 0xb37c30 (unknown) @ 0xa4e5cf (unknown) @ 0xa68ea9 (unknown) @ 0xb00a02 (unknown) @ 0xb068f3 (unknown) @ 0xd2bed8 (unknown) @ 0xd2b114 (unknown) @ 0x7dc26c (unknown) @ 0x1b208bf (unknown) @ 0x9b0a39 (unknown) @ 0x9b1492 (unknown) @ 0xb89327 (unknown) @ 0xb89c64 (unknown) @ 0xdee99a (unknown) @ 0x3f37a07aa1 (unknown) @ 0x3f376e893d (unknown) E1204 03:32:37.983541 7142 impala-server.cc:1339] There was an error processing the impalad catalog update. Requesting a full topic update to recover: CatalogException: Detected catalog service ID change. Aborting updateCatalog()

ScottChris · ‎12-03-2016

We are getting stuck 'insert' impala operations, which immediately afterwards log this error. I1203 03:44:12.598379 25140 impala-beeswax-server.cc:171] query(): query=insert into table AParquetTable (params) partition select (more params) from AnExternalCsvTable I1203 03:44:12.599138 25140 impala-beeswax-server.cc:544] TClientRequest.queryOptions: TQueryOptions { params } I1203 03:44:12.600991 25140 Frontend.java:875] analyze query insert into table AParquetTable (params) partition select (more params) from AnExternalCsvTable I1203 03:44:12.607447 25140 Frontend.java:819] Requesting prioritized load of table(s): default.aparquettable I1203 03:44:43.429272 25124 jni-util.cc:166] com.cloudera.impala.catalog.CatalogException: Detected catalog service ID change. Aborting updateCatalog() at com.cloudera.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:120) at com.cloudera.impala.service.Frontend.updateCatalogCache(Frontend.java:227) at com.cloudera.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:180) I1203 03:44:43.431560 25124 status.cc:111] CatalogException: Detected catalog service ID change. Aborting updateCatalog() @@@ 0x80f2c9 (unknown) @ 0xb37c30 (unknown) @ 0xa4e5cf (unknown) @ 0xa68ea9 (unknown) @ 0xb00a02 (unknown) @ 0xb068f3 (unknown) @ 0xd2bed8 (unknown) @ 0xd2b114 (unknown) @ 0x7dc26c (unknown) @ 0x1b208bf (unknown) @ 0x9b0a39 (unknown) @ 0x9b1492 (unknown) @ 0xb89327 (unknown) @@ 0xb89c64 (unknown) @ 0xdee99a (unknown) @ 0x3d05e07aa1 (unknown) @ 0x3d05ae893d (unknown) E1203 03:44:43.431589 25124 impala-server.cc:1339] There was an error processing the impalad catalog update. Requesting a full topic update to recover: CatalogException: Detected catalog service ID change. Aborting updateCatalog() I1203 04:47:33.657946 25141 impala-server.cc:1676] Connection from client 151.214.100.169:41748 closed, closing 1 associated session(s) I1203 04:47:33.658761 25141 status.cc:111] Session closed @@ 0x80f2c9 (unknown) @ 0xa727b9 (unknown) @ 0xa72cb8 (unknown) @ 0x9aac90 (unknown) @ 0x1b1ca93 (unknown) @ 0x1b03b09 (unknown) @ 0x9b0a39 (unknown) @ 0x9b1492 (unknown) @ 0xb89327 (unknown) @ 0xb89c64 (unknown) @ 0xdee99a (unknown) @ 0x3d05e07aa1 (unknown) @ 0x3d05ae893d (unknown)

ScottChris · ‎11-15-2016

Version: Cloudera Express 5.8.2 (#17 built by jenkins on 20160916-1426 git: d23c620f3a3bbd85d8511d6ebba49beaaab14b75) Parcel Name Version Status Actions CDH 5 5.8.2-1.cdh5.8.2.p0.3 Distributed, Activated $ uname -a Linux hostname_redacted 2.6.32-642.6.2.el6.x86_64 #1 SMP Mon Oct 24 10:22:33 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux We are seeing this error 2 to 10 times per day. What is this? Thank you. Log file created at: 2016/11/03 18:32:33 Running on machine: hostname_redacted Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg E1103 18:32:33.125335 3940 logging.cc:118] stderr will be logged to this file. E1104 03:46:23.122285 4755 impala-server.cc:1339] There was an error processing the impalad catalog update. Requesting a full topic update to recover: CatalogException: Detected catalog service ID change. Aborting updateCatalog() E1106 03:36:54.086156 4755 impala-server.cc:1339] There was an error processing the impalad catalog update. Requesting a full topic update to recover: CatalogException: Detected catalog service ID change. Aborting updateCatalog() E1108 08:16:41.835364 4764 fe-support.cc:308] RPC client failed to connect: Couldn't open transport for hostname_redacted:26000 (connect() failed: Connection refused) E1108 08:16:42.003527 4764 fe-support.cc:308] RPC client failed to connect: Couldn't open transport for hostname_redacted:26000 (connect() failed: Connection refused) E1108 08:16:59.126739 4755 impala-server.cc:1339] There was an error processing the impalad catalog update. Requesting a full topic update to recover: CatalogException: Detected catalog service ID change. Aborting updateCatalog() E1108 09:32:24.577546 4766 fe-support.cc:308] RPC client failed to connect: Couldn't open transport for hostname_redacted:26000 (connect() failed: Connection refused) E1108 09:32:24.595510 4766 fe-support.cc:308] RPC client failed to connect: Couldn't open transport for hostname_redacted:26000 (connect() failed: Connection refused) E1108 09:32:40.664857 4755 impala-server.cc:1339] There was an error processing the impalad catalog update. Requesting a full topic update to recover: CatalogException: Detected catalog service ID change. Aborting updateCatalog() E1109 03:21:00.335299 4755 impala-server.cc:1339] There was an error processing the impalad catalog update. Requesting a full topic update to recover: CatalogException: Detected catalog service ID change. Aborting updateCatalog() E1109 07:47:05.972446 4755 impala-server.cc:1339] There was an error processing the impalad catalog update. Requesting a full topic update to recover: CatalogException: Detected catalog service ID change. Aborting updateCatalog() E1112 01:14:54.006551 4755 impala-server.cc:1339] There was an error processing the impalad catalog update. Requesting a full topic update to recover: CatalogException: Detected catalog service ID change. Aborting updateCatalog() E1112 03:15:03.835086 4755 impala-server.cc:1339] There was an error processing the impalad catalog update. Requesting a full topic update to recover: CatalogException: Detected catalog service ID change. Aborting updateCatalog() E1113 06:14:55.153087 4755 impala-server.cc:1339] There was an error processing the impalad catalog update. Requesting a full topic update to recover: CatalogException: Detected catalog service ID change. Aborting updateCatalog()

ScottChris · ‎09-19-2016

bump

ScottChris · ‎08-24-2016

RedHat 6.7 Parcel: 5.7.0-1.cdh5.7.0.p0.45 RPMs: cloudera-manager-server-5.7.0-1.cm570.p0.76.el6.x86_64 cloudera-manager-agent-5.7.0-1.cm570.p0.76.el6.x86_64 cloudera-manager-daemons-5.7.0-1.cm570.p0.76.el6.x86_64 host monitor is running and the cluster is green, no issues. We are running Parcels, so everything is same version, but see this error over and over in the name log: 2016-08-24 16:06:51,497 WARN BlockStateChange: BLOCK* processReport: Report from the DataNode (dc43ee8e-42ea-4d17-afb0-dc4816c5e4ca) is unsorted. This will cause overhead on the NameNode which needs to sort the Full BR. Please update the DataNode to the same version of Hadoop HDFS as the NameNode (2.6.0-cdh5.7.0).

ScottChris · ‎08-24-2016

RedHat 6.7 Parcel: 5.7.0-1.cdh5.7.0.p0.45 RPMs: cloudera-manager-server-5.7.0-1.cm570.p0.76.el6.x86_64 cloudera-manager-agent-5.7.0-1.cm570.p0.76.el6.x86_64 cloudera-manager-daemons-5.7.0-1.cm570.p0.76.el6.x86_64 host monitor is running and the cluster is green, no issues. saw other posts relating to this warn message, but in our case the system was created recently and basically doing nothing. runs impala and shared filesystem. so far do not observe any failures of either of these features. please advise on how to repair this issue and determine what failures this might be causing. the event log full of chatter/noise, and this is one of the issues reported over and over. 2016-08-24 15:55:48,385 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /10.10.10.120:53028 2016-08-24 15:55:48,387 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x3567b06b3cc5dd5 with negotiated timeout 30000 for client /10.10.10.120:53028 2016-08-24 15:55:48,404 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /10.10.10.120:53032 2016-08-24 15:55:48,406 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /10.10.10.120:53032 2016-08-24 15:55:48,407 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x3567b06b3cc5dd6 with negotiated timeout 30000 for client /10.10.10.120:53032 2016-08-24 15:55:48,440 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1567b06b3ce5ea6 2016-08-24 15:55:48,441 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x3567b06b3cc5dd6 2016-08-24 15:55:48,442 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /10.10.10.120:53032 which had sessionid 0x3567b06b3cc5dd6 2016-08-24 15:55:48,443 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x2567b0773d75e32 2016-08-24 15:55:48,448 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x3567b06b3cc5dd5 2016-08-24 15:55:48,449 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x3567b06b3cc5dd5, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Unknown Source) 2016-08-24 15:55:48,449 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /10.10.10.120:53028 which had sessionid 0x3567b06b3cc5dd5 2016-08-24 15:56:48,413 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /10.10.10.120:53180 2016-08-24 15:56:48,413 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /10.10.10.120:53180 2016-08-24 15:56:48,414 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x3567b06b3cc5dd7 with negotiated timeout 30000 for client /10.10.10.120:53180 2016-08-24 15:56:48,450 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1567b06b3ce5ea8 2016-08-24 15:56:48,455 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x3567b06b3cc5dd7 2016-08-24 15:56:48,456 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /10.10.10.120:53180 which had sessionid 0x3567b06b3cc5dd7 2016-08-24 15:56:48,456 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x2567b0773d75e33 2016-08-24 15:56:48,459 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1567b06b3ce5ea7 2016-08-24 15:57:53,389 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /10.10.10.120:53342 2016-08-24 15:57:53,390 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /10.10.10.120:53342 2016-08-24 15:57:53,391 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x3567b06b3cc5dd8 with negotiated timeout 30000 for client /10.10.10.120:53342 2016-08-24 15:57:53,416 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /10.10.10.120:53348

ScottChris · ‎08-24-2016

RedHat 6.7 Parcel: 5.7.0-1.cdh5.7.0.p0.45 RPMs: cloudera-manager-server-5.7.0-1.cm570.p0.76.el6.x86_64 cloudera-manager-agent-5.7.0-1.cm570.p0.76.el6.x86_64 cloudera-manager-daemons-5.7.0-1.cm570.p0.76.el6.x86_64 host monitor is running and the cluster is green, no issues. however the event log full of chatter/noise, and this is one of the issues reported over and over. # tail -n 100 /var/log/cloudera-scm-firehose/mgmt-cmf-mgmt-HOSTMONITOR-hostname.log.out 2016-08-24 15:45:47,969 ERROR com.cloudera.cmf.BasicScmProxy: Failed request to SCM: 302 2016-08-24 15:45:48,969 INFO com.cloudera.cmf.BasicScmProxy: Authentication to SCM required. 2016-08-24 15:45:49,027 INFO com.cloudera.cmf.BasicScmProxy: Using encrypted credentials for SCM 2016-08-24 15:45:49,031 INFO com.cloudera.cmf.BasicScmProxy: Authenticated to SCM. 2016-08-24 15:47:55,591 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2016-08-24T20:47:55.591Z, forMigratedData=false 2016-08-24 15:49:56,764 INFO com.cloudera.cmon.tstore.leveldb.LDBResourceManager: Closed: 0 partitions 2016-08-24 15:52:55,591 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2016-08-24T20:52:55.591Z, forMigratedData=false 2016-08-24 15:52:55,591 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from raw to rollup=TEN_MINUTELY for rollupTimestamp=2016-08-24T20:50:00.000Z 2016-08-24 15:52:56,438 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.847S, numStreamsChecked=41065, numStreamsRolledUp=3228 2016-08-24 15:57:55,591 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2016-08-24T20:57:55.591Z, forMigratedData=false 2016-08-24 15:59:56,767 INFO com.cloudera.cmon.tstore.leveldb.LDBResourceManager: Closed: 0 partitions 2016-08-24 16:02:55,591 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2016-08-24T21:02:55.591Z, forMigratedData=false 2016-08-24 16:02:55,591 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from raw to rollup=TEN_MINUTELY for rollupTimestamp=2016-08-24T21:00:00.000Z 2016-08-24 16:02:56,527 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.936S, numStreamsChecked=41065, numStreamsRolledUp=3228 2016-08-24 16:02:56,527 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from ts_stream_rollup_PT600S to rollup=HOURLY for rollupTimestamp=2016-08-24T21:00:00.000Z 2016-08-24 16:02:57,381 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.854S, numStreamsChecked=41065, numStreamsRolledUp=3228 2016-08-24 16:07:55,591 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2016-08-24T21:07:55.591Z, forMigratedData=false 2016-08-24 16:09:56,772 INFO com.cloudera.cmon.tstore.leveldb.LDBResourceManager: Closed: 0 partitions 2016-08-24 16:12:55,591 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2016-08-24T21:12:55.591Z, forMigratedData=false 2016-08-24 16:12:55,591 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from raw to rollup=TEN_MINUTELY for rollupTimestamp=2016-08-24T21:10:00.000Z 2016-08-24 16:12:56,282 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.691S, numStreamsChecked=41065, numStreamsRolledUp=3228 2016-08-24 16:16:47,102 ERROR com.cloudera.cmf.BasicScmProxy: Failed request to SCM: 302 2016-08-24 16:16:48,103 INFO com.cloudera.cmf.BasicScmProxy: Authentication to SCM required. 2016-08-24 16:16:48,160 INFO com.cloudera.cmf.BasicScmProxy: Using encrypted credentials for SCM 2016-08-24 16:16:48,165 INFO com.cloudera.cmf.BasicScmProxy: Authenticated to SCM.

ScottChris · ‎06-14-2016

CDH 5.7.0 is not having this issue.

Online	Offline
Last Visited	‎07-12-2018 04:18 PM

Member Since	‎04-08-2016 10:01 AM
Last Visited	‎07-12-2018 04:18 PM
Posts	29
Kudos received	2

Cloudera Community

Re: Non DFS Used is reported much bigger than bash...

Re: Unable to start YARN - Error starting NodeMana...

Re: impala-shell operations getting stuck, spinnin...

Re: error processing the impalad catalog update. R...

Re: Hive Metastore fails to start properly after r...

Re: error processing the impalad catalog update. R...

Re: impala-shell operations getting stuck, spinnin...

impala-shell operations getting stuck, spinning cp...

Re: error processing the impalad catalog update. R...

error processing the impalad catalog update. Reque...

Re: zookeeper log full of WARN org.apache.zookeepe...

name node log full of WARN Please update the DataN...

zookeeper log full of WARN org.apache.zookeeper.se...

host monitor log full of ERROR com.cloudera.cmf.Ba...

Re: Hive Metastore fails to start properly after r...