07-21-2014 01:32 PM
We just upgraded our cdh manager instance from 4.x to latest and greatest 5.1. We have not upgraded our clusters yet, that will be on the way later.
Since we upgraded we now have issues with many of our charts. Many charts are now reporting this warning when they run:
"The query returned partial results. Exceeded the time series 600 stream limit: 'SELECT rpc_call_queue_length WHERE serviceName=XXXX AND roleType = REGIONSERVER"
When we see this we only get a subset of the data we used to get with cdh manager 4. This is only a 65 node cluster, but it is only displaying data for 3 nodes for the above query. I increased the time series limit from 250 to 600, which brought me from 1 node to 3.
Any ideas why this is happening post upgrade and what we can do to resolve it?
07-21-2014 01:41 PM
04-11-2019 10:53 AM
Yeah, I just kept increasing the count until this eventually dissappeared. I increased it to 40,000 and the issue went away. No other side effects that we've noticed and I've run it this way for ~ 5 years.
Slight thread necromancy here but I'm glad this old post helped someone
04-11-2019 02:19 PM
The maximum number of streams is something to help protect CM server and Service Monitor. For example, if some 'bad' queries are issued, increasing the limit would impose load in terms of memory/CPU consumption on both CM server and CM Service Monitor, leveldb read load on Service Monitor, network throughput on CM Server and Service Monitor and etc.
So it is recommended to either refine the query or shorten the time range.
We have internal JIRA (#/OPSAPS-44073) which will improve our documentation to mention that tsquery and time range needs to be adjusted properly to refine the query results instead of just increasing the limit itself.
Thanks and hope this helps,
04-12-2019 08:42 AM
I'm glad you guys are working on fixing the known issue, however your response isnt very helpful.
Simple queries simply cannot be run at all with the default number of threads against a medium to large sized cluster. Theres no amount of optimization that will make simple queries run against a 200 node cluster, even the default dashboards break.
Recommending we shorten the time range silly. There is an arbitrary limit on threads that is preventing us from using cdh manager for its sole job and you are telling us to not use CDH manager for historical data instead of increasing the thread count?
04-12-2019 01:37 PM
Hi @paulusdd ,
Thanks for your feedback.
It is ok to increase the stream limit as long as you have enough system resource to handle that. I just wanted to point out the increase will have a performance impact on CM server and Service Monitor role.