Member since
12-17-2020
23
Posts
2
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
76 | 11-23-2022 01:09 AM | |
310 | 08-25-2022 05:48 AM | |
302 | 08-24-2022 12:36 AM |
11-23-2022
01:09 AM
Hello, I got the fix for this case, maybe this could help anyone having the same kudu Master consensus issue than me. Master1 is not voting: The consensus matrix is: Config source | Replicas | Current term | Config index | Committed? ---------------+--------------+--------------+--------------+------------ Master1 A | A B C | 12026 | -1 | Yes Master2 B | A B C* | 12026 | -1 | Yes Master3 C | A B C* | 12026 | -1 | Yes the workarround is: A)stop the problematic Master and run the below command on Problematic master B)sudo -u kudu kudu local_replica delete --fs_wal_dir=/var/kudu/master --fs_data_dirs=/var/kudu/master 00000000000000000000000000000000 -clean_unsafe C) Please check the kudu leader master with webUI a98a1f26d0254293b6e17e9daf8f6ef8 822fcc68eff448269c9200a8c4c2ecc8 LEADER 2022-11-22 07:18:21 GMT rpc_addresses { host: "sdzw-hpas-35" port: 7051 } http_addresses { host: "sdzw-hpas-35" port: 8051 } software_version: "kudu 1.13.0.7.1.6.0-297 (rev 9323384dbd925202032a965e955979d6d2f6acb0)" https_enabled: false D)sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/wal/kudu/wal --fs_data_dirs=/wal/kudu/data 00000000000000000000000000000000 <active_leader_fqdn>:7051 # sudo -u kudu /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/../lib/kudu/bin/kudu local_replica copy_from_remote --fs_wal_dir=/var/kudu/master --fs_data_dirs=/var/kudu/master 00000000000000000000000000000000 sdzw-hpas-35.nrtsz.local:7051 E)stop remaining two masters F)then start all the three masters.
... View more
11-14-2022
03:33 AM
1 Kudo
Hello, did you tried to use a Loadbalancer like HAproxy? I'm using Postgresql as HA internal database but for sure you can setup with both connections. something like this: frontend hive bind *:10000 mode tcp option tcplog timeout client 50000 default_backend hive_backend backend hive_backend mode tcp balance source timeout connect 5000 timeout server 50000 server hiveserver1 Master1:10000 server hiveserver2 Master2:10000
... View more
11-12-2022
02:25 AM
Hello, In my 3Masters cluster, one Kudu Master is starting and stopping all the time, this is the Log detail from Cloudera Manager: Time Log Level Source Log Message 10:14:41.417 AM WARN cc:288 Found duplicates in --master_addresses: the unique set of addresses is Master1:7051, Master2:7051, Master3:7051 10:15:11.823 AM WARN cc:254 Call kudu.consensus.ConsensusService.RequestConsensusVote from 10.157.136.55:55402 (request call id 0) took 4542 ms (4.54 s). Client timeout 1775 ms (1.78 s) 10:15:11.823 AM WARN cc:254 Call kudu.consensus.ConsensusService.RequestConsensusVote from 10.157.136.37:59796 (request call id 0) took 30215 ms (30.2 s). Client timeout 9654 ms (9.65 s) 10:15:11.823 AM WARN cc:260 Trace: 1112 10:15:07.281146 (+ 0us) service_pool.cc:169] Inserting onto call queue 1112 10:15:07.281169 (+ 23us) service_pool.cc:228] Handling call 1112 10:15:11.823245 (+4542076us) inbound_call.cc:171] Queueing success response Metrics: {"spinlock_wait_cycles":384} 10:15:11.823 AM WARN cc:260 Trace: 1112 10:14:41.607787 (+ 0us) service_pool.cc:169] Inserting onto call queue 1112 10:14:41.607839 (+ 52us) service_pool.cc:228] Handling call 1112 10:15:11.823242 (+30215403us) inbound_call.cc:171] Queueing success response Metrics: {} 10:15:11.823 AM WARN cc:254 Call kudu.consensus.ConsensusService.RequestConsensusVote from 10.157.136.55:55402 (request call id 1) took 4536 ms (4.54 s). Client timeout 1955 ms (1.96 s) 10:15:11.823 AM WARN cc:260 Trace: 1112 10:15:07.286988 (+ 0us) service_pool.cc:169] Inserting onto call queue 1112 10:15:07.287025 (+ 37us) service_pool.cc:228] Handling call 1112 10:15:11.823244 (+4536219us) inbound_call.cc:171] Queueing success response Metrics: {} What does it means??? why is this so unconsistent?
... View more
Labels:
- Labels:
-
Apache Kudu
11-11-2022
03:54 AM
Hello, In a CDP 7.1.6 + Cloudera Manager 7.3.1 cluster. 3Masters +3Workers. I'm getting the error all the time: Corruption: master consensus error: there are master consensus conflicts This is the cluster ksck: Master Summary UUID | Address | Status ----------------------------------+--------------------------+--------- 5620e4a103894151b7bdee5e436f37d8 | master-2.local | HEALTHY 9cea3b56cc9b4be4846a02c0d89be753 | master-1.local | HEALTHY a98a1f26d0254293b6e17e9daf8f6ef8 | master-3.local | HEALTHY All reported replicas are: A = 9cea3b56cc9b4be4846a02c0d89be753 B = 5620e4a103894151b7bdee5e436f37d8 C = a98a1f26d0254293b6e17e9daf8f6ef8 The consensus matrix is: Config source | Replicas | Current term | Config index | Committed? ---------------+--------------+--------------+--------------+------------ A | A B C | 10120 | -1 | Yes B | A B* C | 10120 | -1 | Yes C | A B* C | 10120 | -1 | Yes It seems the A node is not voting, this is the log output: W1111 11:12:00.526211 18688 leader_election.cc:334] T 00000000000000000000000000000000 P 9cea3b56cc9b4be4846a02c0d89be753 [CANDIDATE]: Term 10122 pre-election: RPC error from VoteRequest() call to peer 5620e4a103894151b7bdee5e436f37d8 (master-2:7051): Network error: Client connection negotiation failed: client connection to 10.157.136.55:7051: connect: Connection refused (error 111) W1111 11:12:22.683107 18688 leader_election.cc:334] T 00000000000000000000000000000000 P 9cea3b56cc9b4be4846a02c0d89be753 [CANDIDATE]: Term 10122 pre-election: RPC error from VoteRequest() call to peer 5620e4a103894151b7bdee5e436f37d8 (master-2:7051): Timed out: RequestConsensusVote RPC to 10.157.136.55:7051 timed out after 7.916s (SENT) there is conectivity: # nc -z -v 10.157.136.55 7051 Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connected to 10.157.136.55:7051. Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds. and the masters have been restarted several times, and the whole cluster... Any idea to fix this? Thanks!
... View more
Labels:
- Labels:
-
Apache Kudu
08-25-2022
05:48 AM
Added required dependencies to the repository: glibc-2.17-326.el7_9.i686 --> this was missing and that was why it tried to install 2.17-324 krb5-devel-1.15.1-50.el7.x86_64 openssl-devel-1.0.2k-21.el7_9.x86_64
... View more
08-24-2022
09:10 AM
Yes, it should but not, i'm already using admin user from cloudera manager. So weird this all...
... View more
08-24-2022
07:10 AM
Hello Scharan, tahnk you for your reply, Yes, I have read it thousand times but the problem is that the options (marked in RED) doen't exists in my Yarn Queue Manager... you can see it in the first image shown 😞
... View more
08-24-2022
12:56 AM
Hello, it seems the main error is related to Impala, Kudu is balancing and responding well during the tests, the issue is that Impala breaks the connection to whoever that inform where the new Kudu Master LEADER is. I'm suspicious about the Cloudera Management services that are already down. Will update the solution whenever I have it.
... View more
08-24-2022
12:52 AM
Many thanks for the recomendation, I will check again that it seems to be something related to the repositories, of course I will not downgrade glibc, this environment is quite important to broke it down 🙂
... View more
08-24-2022
12:49 AM
Hello, I'm trying to edit Yarn Queue Manager queues and it seems i'm not able to edit them, there is not the option to edit them and there are no information in the doc about it... it's supposed to have this options panel But this is not my case. the only information found it is to set the queue manager service in yarn, that's all... Any other ideas?
... View more
Labels:
08-24-2022
12:36 AM
I'm finally using the binaries in the parcel dir: # sudo -u kudu /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/kudu master list Master1,Master2,Master3
... View more
08-18-2022
12:17 AM
Hi, First of all, many thanks for your comments. I read about downgrading glibc could leave the system in undesired state :S yu did downgraded glibc? I don't care about the kerberos packages, i'm not using it... Best Regards
... View more
08-17-2022
03:00 AM
Good morning, trying to install Cloudera Manager agents in a RH7.9 (this should be supported as seen in the Cloudera support matrix) I realized that this is not supported ¿!!!? The packages installed in the server are newer as expected (they are from rh7.9) and the cloudera manager is looking for older versions... What crazy is having a support matrix if this is not applied any idea? downgrade dependencies are not considered.
... View more
Labels:
- Labels:
-
Cloudera Manager
03-30-2022
01:53 AM
Hello, does anyone knows if exists any table reference with the errors? Just wanted to know what it means : Unable to initialize the Kudu scan node no relevant traces found in the following logs: Impala Daemon Impala Catalog Server Impala State Store Kudu Master Leader Kudu tablet Hive Metastore Hive Server2 I'm getting out of resources 😞
... View more
03-24-2022
08:22 AM
Same problem here, 3 kudu masters, when 1 is down ...consensus problem. It's suposed that 2of3 should be majority to select the new leader. Note that the Leader is elected: uuid | rpc-addresses | role ---------------------------------+--------------+---------- daa9129e78245be2aaa7e5e649cc1dc8 | master1:7051 | LEADER 8a168b68a6dd587c952419672ba32088 | master3:7051 | FOLLOWER but queries against kudu is not working (ImpalaRuntimeException: Unable to initialize the Kudu scan node) ... 😕
... View more
03-24-2022
05:44 AM
Hello , the ksck is showing that tables are OK ( Recovering | Under-replicated | Unavailable are all = 0 ) W0324 12:15:41.325619 18080 negotiation.cc:313] Failed RPC negotiation. Trace: 0324 12:15:40.627405 (+ 0us) reactor.cc:609] Submitting negotiation task for client connection to master2:7051 0324 12:15:40.627616 (+ 211us) negotiation.cc:98] Waiting for socket to connect 0324 12:15:41.325243 (+697627us) negotiation.cc:304] Negotiation complete: Network error: Client connection negotiation failed: client connection to master2:7051: connect: No route to host (error 113) Metrics: {"client-negotiator.queue_time_us":187,"thread_start_us":157,"threads_started":1} W0324 12:15:44.329090 18080 negotiation.cc:313] Failed RPC negotiation. Trace: 0324 12:15:41.325954 (+ 0us) reactor.cc:609] Submitting negotiation task for client connection to master2:7051 0324 12:15:41.326027 (+ 73us) negotiation.cc:98] Waiting for socket to connect 0324 12:15:44.329031 (+3003004us) negotiation.cc:304] Negotiation complete: Timed out: Client connection negotiation failed: client connection to master2:7051: Timeout exceeded waiting to connect Metrics: {"client-negotiator.queue_time_us":38} W0324 12:15:44.331089 18080 negotiation.cc:313] Failed RPC negotiation. Trace: 0324 12:15:44.329518 (+ 0us) reactor.cc:609] Submitting negotiation task for client connection to master2:7051 0324 12:15:44.329580 (+ 62us) negotiation.cc:98] Waiting for socket to connect 0324 12:15:44.331065 (+ 1485us) negotiation.cc:304] Negotiation complete: Network error: Client connection negotiation failed: client connection to master2:7051: connect: No route to host (error 113) Metrics: {"client-negotiator.queue_time_us":36} Master Summary UUID | Address | Status ----------------------------------------+----------------------------+------------- 8a168b68a6dd487c952419672ba32088 | master3.server.com | HEALTHY daa9129e78244be2aaa7e5e649cc1dc8 | master1.server.com | HEALTHY < unknown > (master2.server.com) | master2.server.com | UNAVAILABLE Error from master2.server.com: Network error: Client connection negotiation failed: client connection to master2:7051: connect: No route to host (error 113) (UNAVAILABLE) All reported replicas are: A = daa9129e78244be2aaa7e5e649cc1dc8 B = < unknown > (master2.server.com) C = 8a168b68a6dd487c952419672ba32088 D = 1f02c618009c44d381c55841dcb5a498 The consensus matrix is: Config source | Replicas | Current term | Config index | Committed? ---------------+------------------------+--------------+--------------+------------ A | A* C D | 88 | -1 | Yes B | [config not available] | | | C | A* C D | 88 | -1 | Yes Flags of checked categories for Master: Flag | Value | Master ---------------------+-------------------------------------------------------------+-------------------------------------------------------- builtin_ntp_servers | 0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org | master1.server.com, master3.server.com time_source | system | master1.server.com, master3.server.com Tablet Server Summary UUID | Address | Status | Location | Tablet Leaders | Active Scanners ----------------------------------+---------------------------------+---------+----------+----------------+----------------- 0dcf7da9a99c43dd99f956a95abb773d | worker3.server.com:7050 | HEALTHY | /default | 85 | 0 345166476e6440b4a957d76dd1de947e | worker2.server.com:7050 | HEALTHY | /default | 314 | 0 9e30289cb8d244e0a7fb822897fc7c29 | worker1.server.com:7050 | HEALTHY | /default | 1051 | 0 Tablet Server Location Summary Location | Count ----------+--------- /default | 3 Flags of checked categories for Tablet Server: Flag | Value | Tablet Server ---------------------+-------------------------------------------------------------+------------------------- builtin_ntp_servers | 0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org | all 3 server(s) checked time_source | system | all 3 server(s) checked Version Summary Version | Servers --------------------+-------------------------------------------------------------------------------------------------------------------------------------- 1.13.0.7.1.6.0-297 | master@master3.server.com, master@master1.server.com, tserver@worker3.server.com:7050, and 2 other server(s) Tablet Summary Summary by table Name | RF | Status | Total Tablets | Healthy | Recovering | Under-replicated | Un available -------------------------------------------------------------------------+----+---------+- --------------+---------+------------+------------------+------------- Tablet Replica Count Summary Statistic | Replica Count ----------------+--------------- Minimum | 1450 First Quartile | 1450 Median | 1450 Third Quartile | 1450 Maximum | 1450 Total Count Summary | Total Count ----------------+------------- Masters | 3 Tablet Servers | 3 Tables | 109 Tablets | 1450 Replicas | 4350 ================== Warnings: ================== master unusual flags check error: 1 of 3 masters were not available to retrieve unusual flags master diverged flags check error: 1 of 3 masters were not available to retrieve time_source category flags ================== Errors: ================== Network error: error fetching info from masters: failed to gather info from all masters: 1 of 3 had errors Corruption: master consensus error: there are master consensus conflicts That I have no clear is why I'm having a consensus error if I have 2of3 Master UP and all 3 Tablet servers UP Many thanks for your help.
... View more
03-23-2022
07:02 AM
Hi Cloudera gurús,
This is my CDP.
3 Master Nodes+3 Worker Nodes
HA enabled and testing it.
Here is the issue: when I shut down Master 2 some queries are randomly failing showing this:
# impala-shell -i haproxy-server.com -q "use dbschema; select * from table_foo limit 10;"
Starting Impala Shell without Kerberos authentication
Warning: live_progress only applies to interactive shell sessions, and is being skipped for now.
Opened TCP connection to haproxy-server.com:21000
Connected to haproxy-server.com:21000
Server version: impalad version 3.4.0-SNAPSHOT RELEASE (build 0cadcf7ac76ecec87d9786048db3672c37d41c6f)
Query: use dbschema
Query: select * from table_foo limit 10
Query submitted at: 2022-03-23 11:28:37 (Coordinator: http://worker1:25000)
ERROR: ImpalaRuntimeException: Unable to initialize the Kudu scan node
CAUSED BY: AnalysisException: Unable to open the Kudu table: dbschema.table_foo
CAUSED BY: NonRecoverableException: cannot complete before timeout: KuduRpc(method=GetTableSchema, tablet=Kudu Master, attempt=1, TimeoutTracker(timeout=180000, elapsed=180004), Trace Summary(0 ms): Sent(1), Received(0), Delayed(0), MasterRefresh(0), AuthRefresh(0), Truncated: false
Sent: (master-192.168.1.10:7051, [ GetTableSchema, 1 ]))
Could not execute command: select * from table_foo limit 10
The thing is that all leaders are correctly re-balanced to other nodes and something is working, because most queries are working.
Does someone have any clue? I was thinking about Hive server but not sure how to trace it.
Note: as CM is in Master2, this is unavailable (this is not affecting, some different tests have been done having CM out of service and queries were working fine)
Note2: does it affects that the kudu Master were in Master2?
Many thanks in advance for your help.
Best Regards
... View more
- Tags:
- CDP
- error
- impala.kudu
Labels:
03-02-2022
09:01 AM
1 Kudo
Whow, this worked perfect! You make my day, my best wishes to you my friend! 😎
... View more
03-02-2022
03:29 AM
Hello,
I recently discovered the CM Rest API and how I can restart services and obtain some information just using the Linux console and not using the CM UI.
I'm looking for some examples to delete and create the Impala Catalog Server and State Store (in a diferent node) by only using this feature.
I did not find any information about it in the internet, just some poor youtube videos to do simple actions.
I'm using
CDP 7.1.6
CM 7.3.1
API v43
This is the Cloudera documentation I could get from my installation
http://my-CMhost.com:7180/static/apidocs/resource_RolesResource.html
I would really appreciate if anyone used this and could share some examples 🙂
... View more
Labels:
- Labels:
-
Apache Impala
-
Cloudera Manager
02-19-2022
02:42 AM
Hello, This command is not working, the binaries are running under the parcel installation and they are not loaded in the systemctl. I've tried to execute the binaries from /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/lib/kudu/sbin/kudu-tserver but it doesn't work properly. root@server:~# sudo /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/lib/kudu/sbin/kudu-tserver --help kudu-tserver: Warning: SetUsageMessage() never called Flags from ../../../../../src/kudu/cfile/block_cache.cc: -block_cache_capacity_mb (block cache capacity in MB) type: int64 default: 512 -block_cache_type (Which type of block cache to use for caching data. Valid choices are 'DRAM' or 'NVM'. DRAM, the default, caches data in regular memory. 'NVM' caches data in a memory-mapped file using the memkind library. To use 'NVM', libmemkind 1.8.0 or newer must be available on the system; otherwise Kudu will crash.) type: string default: "DRAM" -force_block_cache_capacity (Force Kudu to accept the block cache size, even if it is unsafe.) type: bool default: false [...]
... View more
02-15-2022
11:57 PM
Good morning and thanks for your help, I already read this page but the command wasn't loaded as systemctl proccess: # sudo service kudu-tserver restart Redirecting to /bin/systemctl restart kudu-tserver.service Failed to restart kudu-tserver.service: Unit not found. I checked the kudu-tablet proccess and it was executed from the /opt/parcel: kudu 30973 30965 99 07:47 ? 00:05:51 /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/lib/kudu/sbin/kudu-tserver --tserver_master_addrs=master1,master2,master3 --flagfile=/var/run/cloudera-scm-agent/process/1546342994-kudu-KUDU_TSERVER/gflagfile I tried as well executing this kudu-tserver but it doesn't work: # sudo /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/lib/kudu/sbin/kudu-tserver stop usage: /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/lib/kudu/sbin/kudu-tserver
... View more
02-15-2022
03:48 AM
Hello, i'm using a small 3Master+3workers cluster [Cloudera Manager 7.3.1 (Runtime 7.1.6)] Postgresql HA configured (not embedded) The thing is that I need to restart a tablet server if hte CM host is down, and obviously I can't do it using the CM. is there any way to restart these Kudu tablet servers without CM? NOTE: I've already tried sudo service kudu-tserver stop but Failed to stop kudu-tserver.service: Unit kudu-tserver.service not loaded.
... View more
Labels: