We just upgraded our cdh manager instance from 4.x to latest and greatest 5.1. We have not upgraded our clusters yet, that will be on the way later.
Since we upgraded we now have issues with many of our charts. Many charts are now reporting this warning when they run:
"The query returned partial results. Exceeded the time series 600 stream limit: 'SELECT rpc_call_queue_length WHERE serviceName=XXXX AND roleType = REGIONSERVER"
When we see this we only get a subset of the data we used to get with cdh manager 4. This is only a 65 node cluster, but it is only displaying data for 3 nodes for the above query. I increased the time series limit from 250 to 600, which brought me from 1 node to 3.
Any ideas why this is happening post upgrade and what we can do to resolve it?
Yeah, I just kept increasing the count until this eventually dissappeared. I increased it to 40,000 and the issue went away. No other side effects that we've noticed and I've run it this way for ~ 5 years.
Slight thread necromancy here but I'm glad this old post helped someone
The maximum number of streams is something to help protect CM server and Service Monitor. For example, if some 'bad' queries are issued, increasing the limit would impose load in terms of memory/CPU consumption on both CM server and CM Service Monitor, leveldb read load on Service Monitor, network throughput on CM Server and Service Monitor and etc.
So it is recommended to either refine the query or shorten the time range.
We have internal JIRA (#/OPSAPS-44073) which will improve our documentation to mention that tsquery and time range needs to be adjusted properly to refine the query results instead of just increasing the limit itself.
Thanks and hope this helps,
I'm glad you guys are working on fixing the known issue, however your response isnt very helpful.
Simple queries simply cannot be run at all with the default number of threads against a medium to large sized cluster. Theres no amount of optimization that will make simple queries run against a 200 node cluster, even the default dashboards break.
Recommending we shorten the time range silly. There is an arbitrary limit on threads that is preventing us from using cdh manager for its sole job and you are telling us to not use CDH manager for historical data instead of increasing the thread count?
Hi @paulusdd ,
Thanks for your feedback.
It is ok to increase the stream limit as long as you have enough system resource to handle that. I just wanted to point out the increase will have a performance impact on CM server and Service Monitor role.
Depends on how many graphs you are pulling up, how many metrics per graph, and how often. But for context we run CDH manager on a 4 core VM with 250+ cluster nodes attached to it and it works fine
In my cluster, the host on which my Cloudera manager server is running having 24 cores in total and host on which service monitor is running is also having 24 cores. My cluster is of 65 nodes. The streaming limit is 250 now.
My query over here is provided below.
- If I will increase the streaming limit to a higher value then will it put effect on Cloudera manager server and host monitor.
- How much extra cores or memory will be required if we want to increase the streaming limit and want to avoid effect on Cloudera manager server and Service monitor.
- How much core and memory required to process tsqueries properly without affecting Cloudera manager server and Service monitor.