Created on 03-16-2017 02:19 AM - edited 09-16-2022 04:15 AM
Created 03-16-2017 01:33 PM
How exactly are you setting the flag in CM? Did you restart the Kudu service after setting it? Have you verified that it actually took effect? When a Kudu process starts up, it'll log its non-default command line arguments; you can see if maintenance_manager_num_threads is in that log output.
Created 03-16-2017 02:20 PM
Created 03-16-2017 07:09 PM
Created 03-16-2017 09:02 PM
Thank you for the info.
Did you have a chance to look into tserver's log? It might happen that due to some bug the setting from gflagfile are not effective/applied because the tablet server hasn't been restarted with those new flags. Just trying to narrow it down.
Created 03-16-2017 09:29 PM
Another thing I'm curious about the what do you see in 'Thread Groups' page (/threadz) of the tablet server's internal webserver. Looking there would be useful as well.
If there is no activity at all, there would be just 1 item there, named like 'maintenance_scheduler-15482927'. However, that does not mean you have just 1 maintenance thread. E.g., in my case in the 'Thread Group: maintenance' I have just 1 item like that, but in reality I have 6 idle threads waiting on a condition variable (I found that attaching to the process with gdb, and that's under OS X):
11 "MaintenanceMgr [worker]-154828" 0x00007fff8e7c7db6 in __psynch_cvwait ()
10 "MaintenanceMgr [worker]-154828" 0x00007fff8e7c7db6 in __psynch_cvwait ()
9 "MaintenanceMgr [worker]-154828" 0x00007fff8e7c7db6 in __psynch_cvwait ()
8 "MaintenanceMgr [worker]-154828" 0x00007fff8e7c7db6 in __psynch_cvwait ()
7 "MaintenanceMgr [worker]-154828" 0x00007fff8e7c7db6 in __psynch_cvwait ()
6 "MaintenanceMgr [worker]-154828" 0x00007fff8e7c7db6 in __psynch_cvwait ()
The stack of each worker thread looks like the following
(gdb) bt
#0 0x00007fff8e7c7db6 in __psynch_cvwait ()
#1 0x00007fff8f673728 in _pthread_cond_wait ()
#2 0x0000000109e8d223 in kudu::ConditionVariable::Wait (this=0x10b610e28) at /Users/aserbin/Projects/kudu/src/kudu/util/condition_variable.cc:66
#3 0x000000010a056b76 in kudu::ThreadPool::DispatchThread (this=0x10b610d20, permanent=true) at /Users/aserbin/Projects/kudu/src/kudu/util/threadpool.cc:301
#4 0x000000010a05b2f9 in boost::_mfi::mf1<void, kudu::ThreadPool, bool>::operator() (this=0x10b6328e0, p=0x10b610d20, a1=true) at mem_fn_template.hpp:165
#5 0x000000010a05b257 in boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> >::operator()<boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list0> (this=0x10b6328f0, f=@0x10b6328e0, a=@0x70000030fc50) at bind.hpp:315
#6 0x000000010a05b1da in boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >::operator() (this=0x10b6328e0) at bind.hpp:895
#7 0x000000010a05af60 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >, void>::invoke (function_obj_ptr=@0x10b697490) at function_template.hpp:159
#8 0x0000000107c27f68 in boost::function0<void>::operator() (this=0x10b697488) at function_template.hpp:770
#9 0x000000010a047851 in kudu::Thread::SuperviseThread (arg=0x10b697440) at /Users/aserbin/Projects/kudu/src/kudu/util/thread.cc:590
#10 0x00007fff8f67299d in _pthread_body ()
#11 0x00007fff8f67291a in _pthread_start ()
#12 0x00007fff8f670351 in thread_start ()
As you can see, the stack trace of an idle maintenance thread does not contain anything which like MaintenanceManager. That might be the reason you didn't see the expected number of stack traces collected by pstack.
So, I would recommend to clarify on this step-by-step. First, I would clarify whether the kudu-tserver process is running with the expected flags. If yes, then next step would be understanding how much activity you have in your system to expect those threads being busy with appropriate stack traces.
Created 03-20-2017 07:12 PM
Created 03-21-2017 11:30 AM
By default, CM will warn when 50% of a process' FDs are in use. Also by default, Kudu's block manager system will use 50% of the FDs available to the process. So, after accounting for some additional FDs for WALs, Kudu ends up using a little over 50% of the available FDs and CM warns about it.
If this bothers you, you can: