Member since
02-27-2019
11
Posts
0
Kudos Received
0
Solutions
03-29-2019
04:22 AM
This is not so good, because a user has to reason about old logs. With dozens of tablet servers, it is necessary to use some automated configuration management (Ansible or similar tool). Usually it is a function of the logging framework itself (like Log4J).
... View more
03-29-2019
04:10 AM
Kudu rebalance tool crashes - when I run it in command line as well as when I use Cloudera Manager UI.
Here is the stderr displayed in the Cloudera Manager:
+ exec kudu cluster rebalance master1.domain.com,master2.domain.com,master3.domain.com --max_moves_per_server=5 --max_run_time_sec=0 --max_staleness_interval_sec=300
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error
*** Aborted at 1553857242 (unix time) try "date -d @1553857242" if you are using GNU date ***
PC: @ 0x7fbfc3637207 __GI_raise
*** SIGABRT (@0x3ec00005c50) received by PID 23632 (TID 0x7fbfc5c83a00) from PID 23632; stack trace: ***
@ 0x7fbfc5642680 (unknown)
@ 0x7fbfc3637207 __GI_raise
@ 0x7fbfc36388f8 __GI_abort
@ 0x7fbfc3f467d5 __gnu_cxx::__verbose_terminate_handler()
@ 0x7fbfc3f44746 (unknown)
@ 0x7fbfc3f44773 std::terminate()
@ 0x7fbfc3f44993 __cxa_throw
@ 0x7fbfc3f99dd5 std::__throw_regex_error()
@ 0x931c32 std::__detail::_Compiler<>::_M_bracket_expression()
@ 0x931e3a std::__detail::_Compiler<>::_M_atom()
@ 0x932469 std::__detail::_Compiler<>::_M_alternative()
@ 0x9324c4 std::__detail::_Compiler<>::_M_alternative()
@ 0x932649 std::__detail::_Compiler<>::_M_disjunction()
@ 0x93297b std::__detail::_Compiler<>::_Compiler()
@ 0x932cb7 std::__detail::__compile<>()
@ 0x92bfc6 (unknown)
@ 0x92c664 std::_Function_handler<>::_M_invoke()
@ 0xde6672 kudu::tools::Action::Run()
@ 0x9957d7 kudu::tools::DispatchCommand()
@ 0x99619b kudu::tools::RunTool()
@ 0x8dee4d main
@ 0x7fbfc36233d5 __libc_start_main
@ 0x9284b5 (unknown)
I have already created an issue: https://issues.apache.org/jira/browse/KUDU-2753.
Strange that there was no such issue in Jira yet. Did anybody face this issue before?
... View more
Labels:
- Labels:
-
Apache Kudu
-
Cloudera Manager
03-28-2019
08:07 AM
We have already 10+ GB logs for Kudu Masters and 30+ GB logs for Kudu Tablet Servers. How to limit the number of log files to keep? The Kudu Configuration Reference mentiones only a limit on a log file size (--max_log_size), but no limit for the number of log files.
... View more
Labels:
- Labels:
-
Apache Kudu
03-20-2019
12:08 AM
16812 tablets. I see in the Known Issues, that the recommended value is 1000 tablets per tablet server. Maybe this is the cause of the problem.
... View more
03-19-2019
02:39 AM
Hi, Now the tablet server is out of memory: Total consumption 8.11G Memory limit 8.00G Percentage consumed 101.44% Here is the heap sample from the failing node: https://sendeyo.com/up/d/d26504e2ef Here is the output of the "kudu fs check" command: $ kudu fs check -fs_wal_dir=/data/kudu -fs_data_dirs=/data/kudu
I0319 10:23:40.103991 157327 fs_manager.cc:260] Metadata directory not provided
I0319 10:23:40.104055 157327 fs_manager.cc:263] Using existing metadata directory in first data directory
W0319 10:23:40.104472 157327 data_dirs.cc:615] IO error: Could not lock /data/kudu/data/block_manager_instance: Could not lock /data/kudu/data/block_manager_instance: lock /data/kudu/data/block_manager_instance: Resource temporarily unavailable (error 11)
W0319 10:23:40.104492 157327 data_dirs.cc:616] Proceeding without lock
I0319 10:23:40.109122 157327 fs_manager.cc:397] Time spent opening directory manager: real 0.005s user 0.000s sys 0.000s
I0319 10:23:40.109161 157327 env_posix.cc:1634] Not raising this process' open files per process limit of 65535; it is already as high as it can go
I0319 10:23:40.109180 157327 file_cache.cc:470] Constructed file cache lbm with capacity 26214
I0319 10:23:50.317314 157332 log_block_manager.cc:2367] Opened 584 log block containers in /data/kudu/data
I0319 10:24:00.467881 157332 log_block_manager.cc:2367] Opened 700 log block containers in /data/kudu/data
I0319 10:24:10.991781 157332 log_block_manager.cc:2367] Opened 766 log block containers in /data/kudu/data
I0319 10:24:20.999274 157332 log_block_manager.cc:2367] Opened 882 log block containers in /data/kudu/data
I0319 10:24:25.138837 157332 log_block_manager.cc:2437] Read-only block manager, skipping repair
I0319 10:24:25.146647 157327 fs_manager.cc:417] Time spent opening block manager: real 45.037s user 0.000s sys 0.000s
I0319 10:24:25.146875 157327 fs_manager.cc:428] Opened local filesystem: /data/kudu
uuid: "d8b982ee9d6348a29153e4540c6c425a"
format_stamp: "Formatted at 2017-12-20 17:21:05 on ossvert4"
Block manager report
--------------------
1 data directories: /data/kudu/data
Total live blocks: 1933969
Total live bytes: 61130157258
Total live bytes (after alignment): 66793267200
Total number of LBM containers: 906 (354 full)
Total missing blocks: 0
Total orphaned blocks: 514 (0 repaired)
Total orphaned block bytes: 609251411 (0 repaired)
Total full LBM containers with extra space: 37 (0 repaired)
Total full LBM container extra space in bytes: 2317418496 (0 repaired)
Total incomplete LBM containers: 0 (0 repaired)
Total LBM partial records: 0 (0 repaired) The heap sample contains a surprising entry: "kudu HdrHistogram Init 4502.0 (54.4%)". Maybe I miss something, but it looks like an utility to collect some usage metrics. Why it consumed more than 50% of the memory allocated to the tablet server?
... View more
03-18-2019
08:31 AM
I am trying to figure out why all my 3 tablet servers run out of memory, but it's hard to do.
Configuration: 3 tablet servers, each has memory_limit_hard_bytes set to 8GB. All Kudu operations are performed via Impala JDBC.
Symptoms: INSERT operations fail with errors like:
Service unavailable: Soft memory limit exceeded (at 101.27% of capacity
Tablet web interface (http://<kudu-tablet-server>:8050/mem-trackers) shows that all the memory is consumed:
Total consumption
8.08G
Memory limit
8.00G
Percentage consumed
100.98%
Another table on this web page displays details about memory consumption:
Id
Parent
Limit
Current Consumption
Peak consumption
root
none
none
1.00G
1.01G
log_cache
root
1.00G
835.7K
2.59M
...
...
...
...
...
However, when I try to sum up all entries in the "Current Consumption" column, I get much lower value: 2.6 GB. My question: who ate the remaining (8-2.6) = 5.4 GB?
How I sum up all the entries?
- parse the HTML code of the web page and extract the table with detailed memory consumption
- save this table a CSV file
- read it in spark-shell
- convert all memory consumption values to bytes (from GB, MB, KB)
- finally, calculate the sum for the "Current Consumption" and "Peak consumption" columns
Uh...
... View more
Labels:
- Labels:
-
Apache Impala
-
Apache Kudu
03-14-2019
11:45 PM
Cloudera pricing is per-node: https://www.cloudera.com/products/pricing.html Is it only for service nodes or for gateway nodes as well? For example, I have a cluster: HDFS NameNode: 1 node HDFS DataNode: 3 nodes HDFS gateway with my custom app using HDFS: 1 node What is the number of nodes to pay for in this example?
... View more
Labels:
- Labels:
-
Cloudera Manager
03-01-2019
12:45 AM
You have mentioned that NTP is not related to the problem. Let's consider this scenario: 1. Impala Daemon is working with the READ_AT_SNAPSHOT setting enabled. Impala daemon makes a read operation in Kudu. It sets the read timestamp T1 immediately after the preceiding write operation. 2. Kudu despatches the read request to some replica R1. This replica R1 is running on a machine with poorly configured NTP, so the local time on this machine is 1 minute behind. 3. The replica R1 waits for the timeout specified by '--safe_time_max_lag_ms': 30 seconds. After the timeout, the local time is still 30 seconds behind T1 (ideally). Does this lead to the problem under discussion: 'Tablet is lagging too much to be able to serve snapshot scan'?
... View more
02-28-2019
02:23 AM
Thank you for your detailed reply. Indeed, this lagging replica is located on a machine belonging to a different subnetwork. This might cause an additional network delay. You are mentioning the '--safe_time_max_lag_ms' parameter that controls the acceptible replica lag. I see in the Kudu Configuration Reference, that this parameter is available both for kudu-master and kudu-tserver. Which of them two should I tune? The kudu-tserver one? What is the purpose of the another one - for kudu-master? The descriptions for both are identical and do not bring much clarity: 'The maximum amount of time we allow safe time to lag behind the requested timestampbefore forcing the client to retry, in milliseconds.' Is it valid to have different '--safe_time_max_lag_ms' values for different kudu-tservers, so the distant tserver has a higher max lag value? Regarding READ_YOUR_WRITES mode, I have checked the Impala SQL Reference for CDH 6.1 - it does not provide this mode. Anyway, we are stuck with CDH 5.14, so we cannot even use the 'SET KUDU_READ_MODE ...' Impala statement. With CDH 5.14, the only option is to configure the Impala Daemon with '--kudu_read_mode=READ_AT_SNAPSHOT'. Could you also explain how this new READ_YOUR_WRITES mode works? The API doc is not clear on this. Does this mean, that the client automatically takes the read timestamp as the timestamp of the preceding write? In this case, what is the difference with the READ_AT_SNAPSHOT mode when the client does not specify the read timestamp at all? Thanks
... View more
02-28-2019
01:14 AM
Thank you for your detailed reply. Indeed, this lagging replica is located on a machine belonging to a different subnetwork. This might cause an additional network delay. You are mentioning the '--safe_time_max_lag_ms' parameter that controls the acceptible replica lag. I see in the Kudu Configuration Reference, that this parameter is available both for kudu-master and kudu-tserver. Which of them two should I tune? The kudu-tserver one? What is the purpose of the another one - for kudu-master? The descriptions for both are identical and do not bring much clarity: 'The maximum amount of time we allow safe time to lag behind the requested timestampbefore forcing the client to retry, in milliseconds.' Is it valid to have different '--safe_time_max_lag_ms' values for different kudu-tservers, so the distant tserver has a higher max lag value? Regarding READ_YOUR_WRITES mode, I have checked the Impala SQL Reference for CDH 6.1 - it does not provide this mode. Anyway, we are stuck with CDH 5.14, so we cannot even use the 'SET KUDU_READ_MODE ...' Impala statement. With CDH 5.14, the only option is to configure the Impala Daemon with '--kudu_read_mode=READ_AT_SNAPSHOT'. Could you also explain how this new READ_YOUR_WRITES mode works? The API doc is not clear on this. Does this mean, that the client automatically takes the read timestamp as the timestamp of the preceding write? In this case, what is the difference with the READ_AT_SNAPSHOT mode when the client does not specify the read timestamp at all?
... View more