About arseniy

arseniy · ‎03-29-2019

This is not so good, because a user has to reason about old logs. With dozens of tablet servers, it is necessary to use some automated configuration management (Ansible or similar tool). Usually it is a function of the logging framework itself (like Log4J).

arseniy · ‎03-29-2019

Kudu rebalance tool crashes - when I run it in command line as well as when I use Cloudera Manager UI. Here is the stderr displayed in the Cloudera Manager: + exec kudu cluster rebalance master1.domain.com,master2.domain.com,master3.domain.com --max_moves_per_server=5 --max_run_time_sec=0 --max_staleness_interval_sec=300 terminate called after throwing an instance of 'std::regex_error' what(): regex_error *** Aborted at 1553857242 (unix time) try "date -d @1553857242" if you are using GNU date *** PC: @ 0x7fbfc3637207 __GI_raise *** SIGABRT (@0x3ec00005c50) received by PID 23632 (TID 0x7fbfc5c83a00) from PID 23632; stack trace: *** @ 0x7fbfc5642680 (unknown) @ 0x7fbfc3637207 __GI_raise @ 0x7fbfc36388f8 __GI_abort @ 0x7fbfc3f467d5 __gnu_cxx::__verbose_terminate_handler() @ 0x7fbfc3f44746 (unknown) @ 0x7fbfc3f44773 std::terminate() @ 0x7fbfc3f44993 __cxa_throw @ 0x7fbfc3f99dd5 std::__throw_regex_error() @ 0x931c32 std::__detail::_Compiler<>::_M_bracket_expression() @ 0x931e3a std::__detail::_Compiler<>::_M_atom() @ 0x932469 std::__detail::_Compiler<>::_M_alternative() @ 0x9324c4 std::__detail::_Compiler<>::_M_alternative() @ 0x932649 std::__detail::_Compiler<>::_M_disjunction() @ 0x93297b std::__detail::_Compiler<>::_Compiler() @ 0x932cb7 std::__detail::__compile<>() @ 0x92bfc6 (unknown) @ 0x92c664 std::_Function_handler<>::_M_invoke() @ 0xde6672 kudu::tools::Action::Run() @ 0x9957d7 kudu::tools::DispatchCommand() @ 0x99619b kudu::tools::RunTool() @ 0x8dee4d main @ 0x7fbfc36233d5 __libc_start_main @ 0x9284b5 (unknown) I have already created an issue: https://issues.apache.org/jira/browse/KUDU-2753. Strange that there was no such issue in Jira yet. Did anybody face this issue before?

arseniy · ‎03-28-2019

We have already 10+ GB logs for Kudu Masters and 30+ GB logs for Kudu Tablet Servers. How to limit the number of log files to keep? The Kudu Configuration Reference mentiones only a limit on a log file size (--max_log_size), but no limit for the number of log files.

arseniy · ‎03-20-2019

16812 tablets. I see in the Known Issues, that the recommended value is 1000 tablets per tablet server. Maybe this is the cause of the problem.

arseniy · ‎03-19-2019

Hi, Now the tablet server is out of memory: Total consumption 8.11G Memory limit 8.00G Percentage consumed 101.44% Here is the heap sample from the failing node: https://sendeyo.com/up/d/d26504e2ef Here is the output of the "kudu fs check" command: $ kudu fs check -fs_wal_dir=/data/kudu -fs_data_dirs=/data/kudu I0319 10:23:40.103991 157327 fs_manager.cc:260] Metadata directory not provided I0319 10:23:40.104055 157327 fs_manager.cc:263] Using existing metadata directory in first data directory W0319 10:23:40.104472 157327 data_dirs.cc:615] IO error: Could not lock /data/kudu/data/block_manager_instance: Could not lock /data/kudu/data/block_manager_instance: lock /data/kudu/data/block_manager_instance: Resource temporarily unavailable (error 11) W0319 10:23:40.104492 157327 data_dirs.cc:616] Proceeding without lock I0319 10:23:40.109122 157327 fs_manager.cc:397] Time spent opening directory manager: real 0.005s user 0.000s sys 0.000s I0319 10:23:40.109161 157327 env_posix.cc:1634] Not raising this process' open files per process limit of 65535; it is already as high as it can go I0319 10:23:40.109180 157327 file_cache.cc:470] Constructed file cache lbm with capacity 26214 I0319 10:23:50.317314 157332 log_block_manager.cc:2367] Opened 584 log block containers in /data/kudu/data I0319 10:24:00.467881 157332 log_block_manager.cc:2367] Opened 700 log block containers in /data/kudu/data I0319 10:24:10.991781 157332 log_block_manager.cc:2367] Opened 766 log block containers in /data/kudu/data I0319 10:24:20.999274 157332 log_block_manager.cc:2367] Opened 882 log block containers in /data/kudu/data I0319 10:24:25.138837 157332 log_block_manager.cc:2437] Read-only block manager, skipping repair I0319 10:24:25.146647 157327 fs_manager.cc:417] Time spent opening block manager: real 45.037s user 0.000s sys 0.000s I0319 10:24:25.146875 157327 fs_manager.cc:428] Opened local filesystem: /data/kudu uuid: "d8b982ee9d6348a29153e4540c6c425a" format_stamp: "Formatted at 2017-12-20 17:21:05 on ossvert4" Block manager report -------------------- 1 data directories: /data/kudu/data Total live blocks: 1933969 Total live bytes: 61130157258 Total live bytes (after alignment): 66793267200 Total number of LBM containers: 906 (354 full) Total missing blocks: 0 Total orphaned blocks: 514 (0 repaired) Total orphaned block bytes: 609251411 (0 repaired) Total full LBM containers with extra space: 37 (0 repaired) Total full LBM container extra space in bytes: 2317418496 (0 repaired) Total incomplete LBM containers: 0 (0 repaired) Total LBM partial records: 0 (0 repaired) The heap sample contains a surprising entry: "kudu HdrHistogram Init 4502.0 (54.4%)". Maybe I miss something, but it looks like an utility to collect some usage metrics. Why it consumed more than 50% of the memory allocated to the tablet server?

arseniy · ‎03-18-2019

I am trying to figure out why all my 3 tablet servers run out of memory, but it's hard to do. Configuration: 3 tablet servers, each has memory_limit_hard_bytes set to 8GB. All Kudu operations are performed via Impala JDBC. Symptoms: INSERT operations fail with errors like: Service unavailable: Soft memory limit exceeded (at 101.27% of capacity Tablet web interface (http://<kudu-tablet-server>:8050/mem-trackers) shows that all the memory is consumed: Total consumption 8.08G Memory limit 8.00G Percentage consumed 100.98% Another table on this web page displays details about memory consumption: Id Parent Limit Current Consumption Peak consumption root none none 1.00G 1.01G log_cache root 1.00G 835.7K 2.59M ... ... ... ... ... However, when I try to sum up all entries in the "Current Consumption" column, I get much lower value: 2.6 GB. My question: who ate the remaining (8-2.6) = 5.4 GB? How I sum up all the entries? - parse the HTML code of the web page and extract the table with detailed memory consumption - save this table a CSV file - read it in spark-shell - convert all memory consumption values to bytes (from GB, MB, KB) - finally, calculate the sum for the "Current Consumption" and "Peak consumption" columns Uh...

arseniy · ‎03-14-2019

Cloudera pricing is per-node: https://www.cloudera.com/products/pricing.html Is it only for service nodes or for gateway nodes as well? For example, I have a cluster: HDFS NameNode: 1 node HDFS DataNode: 3 nodes HDFS gateway with my custom app using HDFS: 1 node What is the number of nodes to pay for in this example?

arseniy · ‎03-01-2019

You have mentioned that NTP is not related to the problem. Let's consider this scenario: 1. Impala Daemon is working with the READ_AT_SNAPSHOT setting enabled. Impala daemon makes a read operation in Kudu. It sets the read timestamp T1 immediately after the preceiding write operation. 2. Kudu despatches the read request to some replica R1. This replica R1 is running on a machine with poorly configured NTP, so the local time on this machine is 1 minute behind. 3. The replica R1 waits for the timeout specified by '--safe_time_max_lag_ms': 30 seconds. After the timeout, the local time is still 30 seconds behind T1 (ideally). Does this lead to the problem under discussion: 'Tablet is lagging too much to be able to serve snapshot scan'?

arseniy · ‎02-28-2019

Thank you for your detailed reply. Indeed, this lagging replica is located on a machine belonging to a different subnetwork. This might cause an additional network delay. You are mentioning the '--safe_time_max_lag_ms' parameter that controls the acceptible replica lag. I see in the Kudu Configuration Reference, that this parameter is available both for kudu-master and kudu-tserver. Which of them two should I tune? The kudu-tserver one? What is the purpose of the another one - for kudu-master? The descriptions for both are identical and do not bring much clarity: 'The maximum amount of time we allow safe time to lag behind the requested timestampbefore forcing the client to retry, in milliseconds.' Is it valid to have different '--safe_time_max_lag_ms' values for different kudu-tservers, so the distant tserver has a higher max lag value? Regarding READ_YOUR_WRITES mode, I have checked the Impala SQL Reference for CDH 6.1 - it does not provide this mode. Anyway, we are stuck with CDH 5.14, so we cannot even use the 'SET KUDU_READ_MODE ...' Impala statement. With CDH 5.14, the only option is to configure the Impala Daemon with '--kudu_read_mode=READ_AT_SNAPSHOT'. Could you also explain how this new READ_YOUR_WRITES mode works? The API doc is not clear on this. Does this mean, that the client automatically takes the read timestamp as the timestamp of the preceding write? In this case, what is the difference with the READ_AT_SNAPSHOT mode when the client does not specify the read timestamp at all? Thanks

arseniy · ‎02-28-2019

Thank you for your detailed reply. Indeed, this lagging replica is located on a machine belonging to a different subnetwork. This might cause an additional network delay. You are mentioning the '--safe_time_max_lag_ms' parameter that controls the acceptible replica lag. I see in the Kudu Configuration Reference, that this parameter is available both for kudu-master and kudu-tserver. Which of them two should I tune? The kudu-tserver one? What is the purpose of the another one - for kudu-master? The descriptions for both are identical and do not bring much clarity: 'The maximum amount of time we allow safe time to lag behind the requested timestampbefore forcing the client to retry, in milliseconds.' Is it valid to have different '--safe_time_max_lag_ms' values for different kudu-tservers, so the distant tserver has a higher max lag value? Regarding READ_YOUR_WRITES mode, I have checked the Impala SQL Reference for CDH 6.1 - it does not provide this mode. Anyway, we are stuck with CDH 5.14, so we cannot even use the 'SET KUDU_READ_MODE ...' Impala statement. With CDH 5.14, the only option is to configure the Impala Daemon with '--kudu_read_mode=READ_AT_SNAPSHOT'. Could you also explain how this new READ_YOUR_WRITES mode works? The API doc is not clear on this. Does this mean, that the client automatically takes the read timestamp as the timestamp of the preceding write? In this case, what is the difference with the READ_AT_SNAPSHOT mode when the client does not specify the read timestamp at all?

Online	Offline
Last Visited	‎05-08-2019 02:47 AM

Member Since	‎02-27-2019 03:55 AM
Last Visited	‎05-08-2019 02:47 AM
Posts	11

Cloudera Community

Re: Limit Kudu logs

Kudu rebalance crash

Limit Kudu logs

Re: Kudu tablet servers - who eats memory?

Re: Kudu tablet servers - who eats memory?

Kudu tablet servers - who eats memory?

Pricing for gateway nodes

Re: Kudu read fails: Tablet is lagging too much to...

Re: Kudu read fails: Tablet is lagging too much to...

Re: Kudu read fails: Tablet is lagging too much to...