Member since
08-08-2017
1652
Posts
30
Kudos Received
11
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1985 | 06-15-2020 05:23 AM | |
| 16295 | 01-30-2020 08:04 PM | |
| 2136 | 07-07-2019 09:06 PM | |
| 8320 | 01-27-2018 10:17 PM | |
| 4719 | 12-31-2017 10:12 PM |
08-19-2022
04:12 AM
hi all we have HDP cluster with 528 data nodes machines in Ambari HDFS Configs , we configured 3 config group because the following: 1) 212 data nodes machine are with 32G 2) 119 data nodes machines are with 64G 3) 197 data nodes machines are with 128G so in Ambari we have following config group settings now we need to configure the parameter - "DataNode maximum Java heap size ( dtnode_heapsize )" according to the machines memory so we want to set the following: on first 212 data nodes machine are with 32G , DataNode maximum Java heap size will set to 10G on machines - 119 data nodes machines are with 64G , DataNode maximum Java heap size will set to 15G on machines - 197 data nodes machines are with 128G , DataNode maximum Java heap size will set to 20G so in order to configure the parameter - DataNode maximum Java heap size , on each config group , we try to use the following tool - config.py /var/lib/ambari-server/resources/scripts/configs.py -user=admin --password=admin --port=8080 --action=set --host=ambari_server_node --cluster=hdp_cluster7 --config-type=hadoop-env -k "dtnode_heapsize" -v "10000" the above cli will configure the parameter - dtnode_heapsize to 10G ( 10000M ) when we run above cli , the parameter - dtnode_heapsize was update but not on the groups ! what was update is the parameter in the default group - "Default" so how to set the parameter - dtnode_heapsize , according to the relevant config group? I we not sure that config.py support configuration on config group , in that case we need maybe other approach Note - the target is to automate the settings in Ambari by API/REST API or SCRIPTS, so manual changing isn't acceptable
... View more
Labels:
- Labels:
-
Ambari Blueprints
12-23-2021
05:01 AM
We have Hadoop cluster that include `datanode` machines and `5 kafka` machines Kafka machines are installed as part of hortonworks packages , `kafka` version is 0.1X We run on `datanode` the `deeg_data` applications as executors that consuming data from `kafka` topics So applications `deeg_data` are consuming data from topics partitions that exists on kafka cluster ( `deeg_data` used `kafka` client for consuming ) On last days we saw that our application – `deeg_data` are failed and we start to find the root cause On `kafka` cluster we see the following behavior /usr/hdp/current/kafka-broker/bin/kafka-consumer-groups.sh --group deeg_data --describe --bootstrap-server kafka1:6667 To enable GC log rotation, use -Xloggc:<filename> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=<num_of_files> where num_of_file > 0 GC log rotation is turned off Consumer group ‘deeg_data’ is rebalancing from `kafka` side `kafka` cluster is healthy and all topics are balanced and all kafka brokers are up and signed correctly to zookeeper After some time ( couple hours ) , we run again the following , but without the errors about - `Consumer group ‘deeg_data’ is rebalancing` And we get the following correctly results /usr/hdp/current/kafka-broker/bin/kafka-consumer-groups.sh --group deeg_data --describe --bootstrap-server kafka1:6667 To enable GC log rotation, use -Xloggc:<filename> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=<num_of_files> where num_of_file > 0 GC log rotation is turned off GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER deeg_data pot.sdr.proccess 0 6397256247 6403318505 6062258 consumer-1_/10.3.6.237 deeg_data pot.sdr.proccess 1 6397329465 6403390955 6061490 consumer-1_/10.3.6.237 deeg_data pot.sdr.proccess 2 6397314633 6403375153 6060520 consumer-1_/10.3.6.237 deeg_data pot.sdr.proccess 3 6397258695 6403320788 6062093 consumer-1_/10.3.6.237 deeg_data pot.sdr.proccess 4 6397316230 6403378448 6062218 consumer-1_/10.3.6.237 deeg_data pot.sdr.proccess 5 6397325820 6403388053 6062233 consumer-1_/10.3.6.237. . . . So we want to understand why we get: Consumer group ‘deeg_data’ is rebalancing What is the reason for above state , and why we get `rebalancing`
... View more
Labels:
- Labels:
-
Apache Kafka
12-20-2021
01:57 PM
we have 3 Kafka brokers on Linux RHEL 7.6 ( 3 linux machines ) kafka version is 2.7.X brokers ID's are - `1010,1011,1012` from kafka described we can see the following Topic: __consumer_offsets Partition: 0 Leader: none Replicas: 1011,1010,1012 Isr: 1010 Topic: __consumer_offsets Partition: 1 Leader: 1012 Replicas: 1012,1011,1010 Isr: 1012,1011 Topic: __consumer_offsets Partition: 2 Leader: 1011 Replicas: 1010,1012,1011 Isr: 1011,1012 Topic: __consumer_offsets Partition: 3 Leader: none Replicas: 1011,1012,1010 Isr: 1010 Topic: __consumer_offsets Partition: 4 Leader: 1011 Replicas: 1012,1010,1011 Isr: 1011 Topic: __consumer_offsets Partition: 5 Leader: none Replicas: 1010,1011,1012 Isr: 1010 from Zookeeper cli we can see that broker `id 1010` not defined [zk: localhost:2181(CONNECTED) 10] ls /brokers/ids [1011, 1012] and from the log - `state-change.log` we can see the following [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-6 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-9 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-8 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-11 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-10 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-46 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-45 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-48 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-47 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-49 as the local replica for the partition is in an offline log directory (state.change.logger) by ls -ltr , we can see that `controller.log` and `state-change.log` are not update from `Dec 16` -rwxr-xr-x 1 root kafka 343477146 Dec 16 14:15 controller.log -rwxr-xr-x 1 root kafka 207911766 Dec 16 14:15 state-change.log -rw-r--r-- 1 root kafka 68759461 Dec 16 14:15 kafkaServer-gc.log.6.current -rwxr-xr-x 1 root kafka 6570543 Dec 17 09:42 log-cleaner.log -rw-r--r-- 1 root kafka 524288242 Dec 20 00:39 server.log.10 -rw-r--r-- 1 root kafka 524289332 Dec 20 01:37 server.log.9 -rw-r--r-- 1 root kafka 524288452 Dec 20 02:35 server.log.8 -rw-r--r-- 1 root kafka 524288625 Dec 20 03:33 server.log.7 -rw-r--r-- 1 root kafka 524288395 Dec 20 04:30 server.log.6 -rw-r--r-- 1 root kafka 524288237 Dec 20 05:27 server.log.5 -rw-r--r-- 1 root kafka 524289136 Dec 20 06:25 server.log.4 -rw-r--r-- 1 root kafka 524288142 Dec 20 07:25 server.log.3 -rw-r--r-- 1 root kafka 524288187 Dec 20 08:21 server.log.2 -rw-r--r-- 1 root kafka 524288094 Dec 20 10:52 server.log.1 -rw-r--r-- 1 root kafka 323361 Dec 20 19:50 kafkaServer-gc.log.0.current -rw-r--r-- 1 root kafka 323132219 Dec 20 19:50 server.log -rwxr-xr-x 1 root kafka 15669106 Dec 20 19:50 kafkaServer.out what we did until now is that: we restart all 3 zookeeper servers we restart all kafka brokers but still kafka broker `1010` appears as `leader none` , and not in zookeeper data **additional info** [zk: localhost:2181(CONNECTED) 11] get /controller {"version":1,"brokerid":1011,"timestamp":"1640003679634"} cZxid = 0x4900000b0c ctime = Mon Dec 20 12:34:39 UTC 2021 mZxid = 0x4900000b0c mtime = Mon Dec 20 12:34:39 UTC 2021 pZxid = 0x4900000b0c cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x27dd7cf43350080 dataLength = 57 numChildren = 0 **from kafka01** more meta.properties # #Tue Nov 16 07:45:36 UTC 2021 cluster.id=D3KpekCETmaNveBJzE6PZg version=0 broker.id=1010 **relevant ideas** in topics disk we have the following files ( additionally to topics partitions ) -rw-r--r-- 1 root kafka 91 Nov 16 07:45 meta.properties -rw-r--r-- 1 root kafka 161 Dec 15 16:04 cleaner-offset-checkpoint -rw-r--r-- 1 root kafka 13010 Dec 15 16:20 replication-offset-checkpoint -rw-r--r-- 1 root kafka 1928 Dec 17 09:42 recovery-point-offset-checkpoint -rw-r--r-- 1 root kafka 80 Dec 17 09:42 log-start-offset-checkpoint any idea if deletion of one or more of above files can help with our issue?
... View more
Labels:
- Labels:
-
Apache Kafka
11-24-2021
01:10 PM
we have HDP cluster ( Hadoop cluster version 2.6.5 ) and we add the ranger and ranger KMS to the cluster as services after adding the service - ranger KMS and do some settings we performed the following [hdfs@worker01 tmp]$ hdfs dfs -mkdir /zone_encr_1 [hdfs@worker01 tmp]$ hdfs crypto -createZone -keyName secret_hdp1 -path /zone_encr_1 Added encryption zone /zone_encr_1 [hdfs@worker01 tmp]$ hdfs dfs -copyFromLocal file.txt /zone_encr_1 [hdfs@worker01 tmp]$ hdfs dfs -cat /zone_encr_1/file.txt hello every one [hdfs@worker01 tmp]$ hdfs dfs -ls /zone_encr_1/file.txt -rw-r--r-- 2 hdfs hdfs 23 2021-11-24 20:19 /zone_encr_1/file.txt [hdfs@worker01 tmp]$ hdfs crypto -listZones /zone_encr secret_hdp1 /zone_encr_new secret_hdp1 /zone_encr_1 secret_hdp1 as we can see above first we create folder - /zone_encr_1 under hdfs then we add encryption to folder - /zone_encr_1 then we copy from local folder the file - file.txt that include the text - "hello every one" to hdfs folder - /zone_encr_1 then we do the test with `hdfs dfs -cat /zone_encr_1/file.txt` and we expect to get encrypted file , but we not we still get the file as hello every one since I just to learn the ranger KMS capabilities , I am not sure if I missed something https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.1/bk_security/content/copy-to-from-encr-zone.html
... View more
Labels:
- Labels:
-
Apache Ranger
09-02-2021
04:27 AM
hi all
we have Kafka server with version 2.6 on Linux rhel machine
what is wrong with the following Kafka cli
kafka-topics.sh --bootstrap-server="kafka1:6667" --describe|more Error while executing topic command : The broker does not support DESCRIBE_CONFIGS [2021-09-02 11:19:02,486] ERROR org.apache.kafka.common.errors.UnsupportedVersionException: The broker does not support DESCRIBE_CONFIGS (kafka.admin.TopicCommand$)
is above cli - is valid approach ?
... View more
Labels:
- Labels:
-
Apache Kafka
07-17-2021
11:43 PM
the logs are include some sensitive data , so I cant to attached all the log content , but the lines that I posted are the lines that are popular in the logs , also I forget to mention another problem that we cant also to access port 8088
... View more
07-01-2021
01:29 AM
based on the logs and based on what you see what is the preferred selecting that we need to use CapacityScheduler OR FairScheduler ?
... View more
06-30-2021
05:10 AM
can you share please the link/doc that described the above table ?
... View more
06-30-2021
04:56 AM
and yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.enabled --> false
... View more
06-30-2021
04:55 AM
here the details capacity-scheduler=null yarn.scheduler.capacity.default.minimum-user-limit-percent=100 yarn.scheduler.capacity.maximum-am-resource-percent=0.2 yarn.scheduler.capacity.maximum-applications=10000 yarn.scheduler.capacity.node-locality-delay=40 yarn.scheduler.capacity.root.accessible-node-labels=* yarn.scheduler.capacity.root.acl_administer_queue=* yarn.scheduler.capacity.root.capacity=100 yarn.scheduler.capacity.root.default.acl_administer_jobs=* yarn.scheduler.capacity.root.default.acl_submit_applications=* yarn.scheduler.capacity.root.default.capacity=100 yarn.scheduler.capacity.root.default.maximum-capacity=100 yarn.scheduler.capacity.root.default.state=RUNNING yarn.scheduler.capacity.root.default.user-limit-factor=1 yarn.scheduler.capacity.root.queues=default
... View more