Support Questions

jaeseung · ‎11-25-2021

안녕하세요

아래와 같은 오류메시지로 지속적으로 발생하고 있습니다.

SMM이 CM쪽으로 API 요청에서부터 timeout이 발생하는 것으로 보입니다.

전체적인 가이드 요청드립니다.

TimePeriod : LAST_ONE_WEEK, Error while fetching cluster metrics : [MetricDescriptor{metricName=MetricName(name=sum(kafka_bytes_fetched_by_partition_rate), tags=[partition, serviceName, topic], valueType=LONG, singlePointOfValue=true), queryTags={serviceName=kafka, topic=%, partition=%}, aggrFunction=SUM, postProcessFunction=null, valueType=LONG}, MetricDescriptor{metricName=MetricName(name=sum(kafka_messages_received_by_partition_rate), tags=[partition, serviceName, topic], valueType=LONG, singlePointOfValue=true), queryTags={serviceName=kafka, topic=%, partition=%}, aggrFunction=SUM, postProcessFunction=null, valueType=LONG}, MetricDescriptor{metricName=MetricName(name=sum(kafka_bytes_received_by_partition_rate), tags=[partition, serviceName, topic], valueType=LONG, singlePointOfValue=true), queryTags={serviceName=kafka, topic=%, partition=%}, aggrFunction=SUM, postProcessFunction=null, valueType=LONG}]
com.hortonworks.smm.kafka.services.common.errors.InvalidCMApiResponseException: Invalid response returned CM API: http://icahubkafka005.datahub.skhynix.com:7180/api/v32/timeseries, response.status: 500,response.message: {
"message" : "java.util.concurrent.TimeoutException"
}
at com.hortonworks.smm.kafka.services.metric.cm.CMMetricsFetcher.cmApiCall(CMMetricsFetcher.java:389)
at com.hortonworks.smm.kafka.services.metric.cm.CMMetricsFetcher.cmApiPost(CMMetricsFetcher.java:368)
at com.hortonworks.smm.kafka.services.metric.cm.CMMetricsFetcher.getMetricsFromCmApi(CMMetricsFetcher.java:479)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1699)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747)
at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721)
at java.util.stream.AbstractTask.compute(AbstractTask.java:316)
at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401)
at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
at java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:474)
at com.hortonworks.smm.kafka.services.metric.cm.CMMetricsFetcher.queryMetrics(CMMetricsFetcher.java:464)
at com.hortonworks.smm.kafka.services.metric.cm.CMMetricsFetcher.getClusterMetrics(CMMetricsFetcher.java:184)
at com.hortonworks.smm.kafka.services.metric.cache.MetricsCache$RefreshMetricsCacheTask.lambda$null$21(MetricsCache.java:623)
at com.hortonworks.smm.kafka.services.metric.cache.MetricsCache$RefreshMetricsCacheTask.fetchMetrics(MetricsCache.java:575)
at com.hortonworks.smm.kafka.services.metric.cache.MetricsCache$RefreshMetricsCacheTask.lambda$refreshClusterMetrics$22(MetricsCache.java:622)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Nandinin · ‎12-02-2021

Hello,

This timeout exceptions relates to CM Metrics Store (firehose) being overloaded.

Please review the below article - https://community.cloudera.com/t5/Customer/How-to-enable-the-entity-summary-servlet-in-Cloudera-Mana...

Check KAFKA_PRODUCER and KAFKA_CONSUMER, if we have too many entities (millions), this might cause SMON to request a lot of memory to process the metrics causing timeout exceptions in the SMM server.
- To avoid a huge amount of entities that will cause issues with services like SMON, use a client.id in your producers, from the consumer point of view, use a group.id to avoid creating random ids every time a client is executed.
Alternatively Resetting/deleting the Firehose LevelDB storage could be an option to recover from this.
If the SMM server is getting timeout exceptions, check the SMM heap size, it’s recommended (depending on the number of resources we are monitoring) to increase this, acceptable values for production environments are between 8~16GB for SMM.

SME || Kafka | Schema Registry | SMM | SRM

VidyaSargur · ‎12-06-2021

@jaeseung, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.

Regards,

Vidya Sargur,
Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

Support Questions

Stream Messaging Manager 의 TimeoutException 발생