Member since
08-08-2017
1652
Posts
30
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1482 | 06-15-2020 05:23 AM | |
9492 | 01-30-2020 08:04 PM | |
1630 | 07-07-2019 09:06 PM | |
6774 | 01-27-2018 10:17 PM | |
3808 | 12-31-2017 10:12 PM |
10-28-2022
02:24 AM
we have Ambari hadoop cluster based on Ambari platform and installed with HDP version - 2.6.5 , when all machines in the cluster are with RHEL 7.9 version Ambari cluster of course include the YARN service with two resource manager services we are facing a problems about ( when master1 and master2 nodes running with the resources manager services ) Connection failed to http://master2.start.com:8088 (timed out) we tested the below alert with following `wget` approach , when alert appears on Ambari then following `wget` test is hang and sometimes its take a time until `wget` finished with results [root@master2 yarn]# wget http://master2.start.com:8088 --2022-10-28 08:12:49-- http://master2.start.com:8088/ Resolving master2.start.com (master2.start.com)... 172.3.45.68 Connecting to master2.start.com (master2.start.com)|172.3.45.68|:8088... connected. HTTP request sent, awaiting response... 307 TEMPORARY_REDIRECT Location: http://master1.start.com:8088/ [following] --2022-10-28 08:12:50-- http://master1.start.com:8088/ Resolving master1.start.com (master1.start.com)... 172.3.45.61 Connecting to master1.start.com (master01.start.com)|172.3.45.61|:8088... connected. HTTP request sent, awaiting response... 302 Found Location: http://master1.start.com:8088/cluster [following] --2022-10-28 08:12:50-- http://master1.start.com:8088/cluster Reusing existing connection to master1.start.com:8088. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘index.html.35’ [ <=> ] 5,419,141 9.46MB/s in 0.5s 2022-10-28 08:12:52 (9.46 MB/s) - ‘index.html.35’ saved [5419141] port 8088 are licensing from both nodes ps -ef | grep `lsof -i :8088 | grep -i listen | awk '{print $2}'` yarn 1977 1 16 Oct27 ? 02:37:32 /usr/jdk64/jdk1.8.0_112/bin/java -Dproc_resourcemanager also we cheeked with jps jps -l | grep -i resourcemanager 1977 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager we also verify the resources manager logs and we see the following 2022-10-27 08:04:30,071 WARN webapp.GenericExceptionHandler (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.yarn.api.records.ContainerId.toString(ContainerId.java:196) at org.apache.hadoop.yarn.util.ConverterUtils.toString(ConverterUtils.java:165) at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.<init>(AppInfo.java:169) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:603) at sun.reflect.GeneratedMethodAccessor155.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) 2022-10-27 08:04:32,056 WARN webapp.GenericExceptionHandler (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.yarn.api.records.ContainerId.toString(ContainerId.java:196) at org.apache.hadoop.yarn.util.ConverterUtils.toString(ConverterUtils.java:165) at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.<init>(AppInfo.java:169) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:603) at sun.reflect.GeneratedMethodAccessor155.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResource 2022-10-27 08:05:43,170 ERROR recovery.RMStateStore (RMStateStore.java:notifyStoreOperationFailedInternal(992)) - State store operation failed org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFencedException: RMStateStore has been fenced at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1213) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1001) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:1042) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:639) 2022-10-27 08:05:49,584 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(659)) - ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2022-10-28 08:43:31,259 ERROR metrics.SystemMetricsPublisher (SystemMetricsPublisher.java:putEntity(549)) - Error when publishing entity [YARN_APPLICATION,application_1664925617878_1896] com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: Stream closed. at com.sun.jersey.api.client.ClientResponse.bufferEntity(ClientResponse.java:583) at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:157) at org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115) at org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112) at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:348) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.putEntity(SystemMetricsPublisher.java:536) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.publishApplicationACLsUpdatedEvent(SystemMetricsPublisher.java:392) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.handleSystemMetricsEvent(SystemMetricsPublisher.java:257) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:564) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:559) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Stream closed. at java.net.AbstractPlainSocketImpl.available(AbstractPlainSocketImpl.java:470) at java.net.SocketInputStream.available(SocketInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:353) at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:552) at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609) still not clearly from the above logs why resources manager are with the alerts - `Connection failed to http://master2.start.com:8088 (timed out)`
... View more
Labels:
- Labels:
-
Apache Hadoop
10-28-2022
01:50 AM
since we have the Ambari do you mean that we need to find the GC settings in yarn-env template ?
... View more
10-28-2022
12:32 AM
we have old hadoop cluster based on HDP from hortonworks version HDP 2.6.4 cluster include 2 namenode services when one is the standby namenode and the second is the active namenode , all machines in the cluster are rhel 7.2 version , and we not see any problem on OS level also cluster include 12 workers machines ( worker include the datanode and node manager services ) the story begin when we get alerts from the smoke test script that complain about "`Detected pause in JVM or host machine`" on the standby namenode , so based on that we decided to increase the namenode heap size from 60G to 100G and above setting was based on table that show how much memory to set according to number of files in HDFS and according to the table we decided to set the namenode heapsize to 100G and then we restart the HDFS service after HDFS is completely restarted , we still see the messages about `Detected pause in JVM or host machine` , and this is really strange because we almost twice the namenode heap size so we start to perform deeply testing as by `jstat` for example we get from jsat low very of FGCT that is really good values and not point on namenode heap problem ( 1837 is the HDFS PID number ) # /usr/jdk64/jdk1.8.0_112/bin/jstat -gcutil 1837 10 10 S0 S1 E O M CCS YGC YGCT FGC FGCT GCT 0.00 1.95 32.30 34.74 97.89 - 197 173.922 2 1.798 175.720 0.00 1.95 32.30 34.74 97.89 - 197 173.922 2 1.798 175.720 0.00 1.95 32.30 34.74 97.89 - 197 173.922 2 1.798 175.720 0.00 1.95 32.30 34.74 97.89 - 197 173.922 2 1.798 175.720 0.00 1.95 32.30 34.74 97.89 - 197 173.922 2 1.798 175.720 0.00 1.95 32.30 34.74 97.89 - 197 173.922 2 1.798 175.720 0.00 1.95 32.30 34.74 97.89 - 197 173.922 2 1.798 175.720 0.00 1.95 32.30 34.74 97.89 - 197 173.922 2 1.798 175.720 0.00 1.95 32.30 34.74 97.89 - 197 173.922 2 1.798 175.720 0.00 1.95 32.30 34.74 97.89 - 197 173.922 2 1.798 175.720 and here is the messages from namenode logs 2022-10-27 14:04:49,728 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2044ms 2022-10-27 16:21:33,973 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2524ms 2022-10-27 17:31:35,333 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2444ms 2022-10-27 18:55:55,387 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2134ms 2022-10-27 19:42:00,816 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2153ms 2022-10-27 20:50:23,624 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2050ms 2022-10-27 21:07:01,240 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2343ms 2022-10-27 23:53:00,507 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2120ms 2022-10-28 00:43:30,633 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1811ms 2022-10-28 00:53:35,120 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2192ms 2022-10-28 02:07:39,660 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2353ms 2022-10-28 02:49:25,018 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1698ms 2022-10-28 03:00:20,592 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2432ms 2022-10-28 05:02:15,093 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approximately 2016ms 2022-10-28 06:52:46,672 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(196)) - Detected pause in JVM or host machine (eg GC): pause of approxim as we can see each 1-2 hours message about pause in `JVM or host machine` is appears we checked the number of files in HDFS and number of file is 7 million files so what else we can do , we can increase a little bit the namenode heap size but my feeling is that heap size is really enough
... View more
Labels:
- Labels:
-
Apache Hadoop
09-12-2022
03:57 AM
Hi all We have Ambari HDP cluster ( HDP version - 2.6.4 ) , with 420 workers linux machines ( when each worker include data node and node manager service ) Unfortunately Ambari DB is damaged , and we not have Ambari DB dump , so we cant recover Ambari DB , so actually we not have Ambari and Ambari GUI But HDFS disks on workers machines include HDFS data , and name node is still working with all data as ( journal/hdfsha/current/ ) and ( namenode/current ) So HDFS works without Ambari So regarding what I said until now - it is possible install new Ambari cluster from scratch , and then add existing working HDFS data to the cluster ? Dose hortonworks / cloudera have procedure for this process ?
... View more
Labels:
- Labels:
-
HDFS
08-20-2022
02:23 PM
first thank you so much , for your help , I see in the post the following example: [{"ConfigGroup":{"id":2,"cluster_name":"c1","group_name":"A config group","tag":"HDFS","description":"A config group","hosts":[{"host_name":"host1"}],"service_config_version_note":"change","desired_configs":[{"type":"hdfs-site","tag":"version1443587493807","properties":{"dfs.replication":"2","dfs.datanode.du.reserved":"1073741822"}}]}}] I will appreciate , to get full example about how to run this API , by using curl or full Ambari API note - about - version1443587493807 , is this version number is "random" number that I need to set ?
... View more
08-20-2022
02:13 PM
hi smohanty can you show me example for - how to config the dfs.replication by using full example ( as with curl ) ? you mentioned the - version1443587493807 , according to what I need to set this "version number"?
... View more
08-19-2022
04:12 AM
hi all we have HDP cluster with 528 data nodes machines in Ambari HDFS Configs , we configured 3 config group because the following: 1) 212 data nodes machine are with 32G 2) 119 data nodes machines are with 64G 3) 197 data nodes machines are with 128G so in Ambari we have following config group settings now we need to configure the parameter - "DataNode maximum Java heap size ( dtnode_heapsize )" according to the machines memory so we want to set the following: on first 212 data nodes machine are with 32G , DataNode maximum Java heap size will set to 10G on machines - 119 data nodes machines are with 64G , DataNode maximum Java heap size will set to 15G on machines - 197 data nodes machines are with 128G , DataNode maximum Java heap size will set to 20G so in order to configure the parameter - DataNode maximum Java heap size , on each config group , we try to use the following tool - config.py /var/lib/ambari-server/resources/scripts/configs.py -user=admin --password=admin --port=8080 --action=set --host=ambari_server_node --cluster=hdp_cluster7 --config-type=hadoop-env -k "dtnode_heapsize" -v "10000" the above cli will configure the parameter - dtnode_heapsize to 10G ( 10000M ) when we run above cli , the parameter - dtnode_heapsize was update but not on the groups ! what was update is the parameter in the default group - "Default" so how to set the parameter - dtnode_heapsize , according to the relevant config group? I we not sure that config.py support configuration on config group , in that case we need maybe other approach Note - the target is to automate the settings in Ambari by API/REST API or SCRIPTS, so manual changing isn't acceptable
... View more
Labels:
- Labels:
-
Ambari Blueprints
12-23-2021
05:01 AM
We have Hadoop cluster that include `datanode` machines and `5 kafka` machines Kafka machines are installed as part of hortonworks packages , `kafka` version is 0.1X We run on `datanode` the `deeg_data` applications as executors that consuming data from `kafka` topics So applications `deeg_data` are consuming data from topics partitions that exists on kafka cluster ( `deeg_data` used `kafka` client for consuming ) On last days we saw that our application – `deeg_data` are failed and we start to find the root cause On `kafka` cluster we see the following behavior /usr/hdp/current/kafka-broker/bin/kafka-consumer-groups.sh --group deeg_data --describe --bootstrap-server kafka1:6667 To enable GC log rotation, use -Xloggc:<filename> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=<num_of_files> where num_of_file > 0 GC log rotation is turned off Consumer group ‘deeg_data’ is rebalancing from `kafka` side `kafka` cluster is healthy and all topics are balanced and all kafka brokers are up and signed correctly to zookeeper After some time ( couple hours ) , we run again the following , but without the errors about - `Consumer group ‘deeg_data’ is rebalancing` And we get the following correctly results /usr/hdp/current/kafka-broker/bin/kafka-consumer-groups.sh --group deeg_data --describe --bootstrap-server kafka1:6667 To enable GC log rotation, use -Xloggc:<filename> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=<num_of_files> where num_of_file > 0 GC log rotation is turned off GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER deeg_data pot.sdr.proccess 0 6397256247 6403318505 6062258 consumer-1_/10.3.6.237 deeg_data pot.sdr.proccess 1 6397329465 6403390955 6061490 consumer-1_/10.3.6.237 deeg_data pot.sdr.proccess 2 6397314633 6403375153 6060520 consumer-1_/10.3.6.237 deeg_data pot.sdr.proccess 3 6397258695 6403320788 6062093 consumer-1_/10.3.6.237 deeg_data pot.sdr.proccess 4 6397316230 6403378448 6062218 consumer-1_/10.3.6.237 deeg_data pot.sdr.proccess 5 6397325820 6403388053 6062233 consumer-1_/10.3.6.237. . . . So we want to understand why we get: Consumer group ‘deeg_data’ is rebalancing What is the reason for above state , and why we get `rebalancing`
... View more
Labels:
- Labels:
-
Apache Kafka
12-20-2021
01:57 PM
we have 3 Kafka brokers on Linux RHEL 7.6 ( 3 linux machines ) kafka version is 2.7.X brokers ID's are - `1010,1011,1012` from kafka described we can see the following Topic: __consumer_offsets Partition: 0 Leader: none Replicas: 1011,1010,1012 Isr: 1010 Topic: __consumer_offsets Partition: 1 Leader: 1012 Replicas: 1012,1011,1010 Isr: 1012,1011 Topic: __consumer_offsets Partition: 2 Leader: 1011 Replicas: 1010,1012,1011 Isr: 1011,1012 Topic: __consumer_offsets Partition: 3 Leader: none Replicas: 1011,1012,1010 Isr: 1010 Topic: __consumer_offsets Partition: 4 Leader: 1011 Replicas: 1012,1010,1011 Isr: 1011 Topic: __consumer_offsets Partition: 5 Leader: none Replicas: 1010,1011,1012 Isr: 1010 from Zookeeper cli we can see that broker `id 1010` not defined [zk: localhost:2181(CONNECTED) 10] ls /brokers/ids [1011, 1012] and from the log - `state-change.log` we can see the following [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-6 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-9 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-8 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-11 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-10 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-46 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-45 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-48 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-47 as the local replica for the partition is in an offline log directory (state.change.logger) [2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-49 as the local replica for the partition is in an offline log directory (state.change.logger) by ls -ltr , we can see that `controller.log` and `state-change.log` are not update from `Dec 16` -rwxr-xr-x 1 root kafka 343477146 Dec 16 14:15 controller.log -rwxr-xr-x 1 root kafka 207911766 Dec 16 14:15 state-change.log -rw-r--r-- 1 root kafka 68759461 Dec 16 14:15 kafkaServer-gc.log.6.current -rwxr-xr-x 1 root kafka 6570543 Dec 17 09:42 log-cleaner.log -rw-r--r-- 1 root kafka 524288242 Dec 20 00:39 server.log.10 -rw-r--r-- 1 root kafka 524289332 Dec 20 01:37 server.log.9 -rw-r--r-- 1 root kafka 524288452 Dec 20 02:35 server.log.8 -rw-r--r-- 1 root kafka 524288625 Dec 20 03:33 server.log.7 -rw-r--r-- 1 root kafka 524288395 Dec 20 04:30 server.log.6 -rw-r--r-- 1 root kafka 524288237 Dec 20 05:27 server.log.5 -rw-r--r-- 1 root kafka 524289136 Dec 20 06:25 server.log.4 -rw-r--r-- 1 root kafka 524288142 Dec 20 07:25 server.log.3 -rw-r--r-- 1 root kafka 524288187 Dec 20 08:21 server.log.2 -rw-r--r-- 1 root kafka 524288094 Dec 20 10:52 server.log.1 -rw-r--r-- 1 root kafka 323361 Dec 20 19:50 kafkaServer-gc.log.0.current -rw-r--r-- 1 root kafka 323132219 Dec 20 19:50 server.log -rwxr-xr-x 1 root kafka 15669106 Dec 20 19:50 kafkaServer.out what we did until now is that: we restart all 3 zookeeper servers we restart all kafka brokers but still kafka broker `1010` appears as `leader none` , and not in zookeeper data **additional info** [zk: localhost:2181(CONNECTED) 11] get /controller {"version":1,"brokerid":1011,"timestamp":"1640003679634"} cZxid = 0x4900000b0c ctime = Mon Dec 20 12:34:39 UTC 2021 mZxid = 0x4900000b0c mtime = Mon Dec 20 12:34:39 UTC 2021 pZxid = 0x4900000b0c cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x27dd7cf43350080 dataLength = 57 numChildren = 0 **from kafka01** more meta.properties # #Tue Nov 16 07:45:36 UTC 2021 cluster.id=D3KpekCETmaNveBJzE6PZg version=0 broker.id=1010 **relevant ideas** in topics disk we have the following files ( additionally to topics partitions ) -rw-r--r-- 1 root kafka 91 Nov 16 07:45 meta.properties -rw-r--r-- 1 root kafka 161 Dec 15 16:04 cleaner-offset-checkpoint -rw-r--r-- 1 root kafka 13010 Dec 15 16:20 replication-offset-checkpoint -rw-r--r-- 1 root kafka 1928 Dec 17 09:42 recovery-point-offset-checkpoint -rw-r--r-- 1 root kafka 80 Dec 17 09:42 log-start-offset-checkpoint any idea if deletion of one or more of above files can help with our issue?
... View more
Labels:
- Labels:
-
Apache Kafka
11-24-2021
01:10 PM
we have HDP cluster ( Hadoop cluster version 2.6.5 ) and we add the ranger and ranger KMS to the cluster as services after adding the service - ranger KMS and do some settings we performed the following [hdfs@worker01 tmp]$ hdfs dfs -mkdir /zone_encr_1 [hdfs@worker01 tmp]$ hdfs crypto -createZone -keyName secret_hdp1 -path /zone_encr_1 Added encryption zone /zone_encr_1 [hdfs@worker01 tmp]$ hdfs dfs -copyFromLocal file.txt /zone_encr_1 [hdfs@worker01 tmp]$ hdfs dfs -cat /zone_encr_1/file.txt hello every one [hdfs@worker01 tmp]$ hdfs dfs -ls /zone_encr_1/file.txt -rw-r--r-- 2 hdfs hdfs 23 2021-11-24 20:19 /zone_encr_1/file.txt [hdfs@worker01 tmp]$ hdfs crypto -listZones /zone_encr secret_hdp1 /zone_encr_new secret_hdp1 /zone_encr_1 secret_hdp1 as we can see above first we create folder - /zone_encr_1 under hdfs then we add encryption to folder - /zone_encr_1 then we copy from local folder the file - file.txt that include the text - "hello every one" to hdfs folder - /zone_encr_1 then we do the test with `hdfs dfs -cat /zone_encr_1/file.txt` and we expect to get encrypted file , but we not we still get the file as hello every one since I just to learn the ranger KMS capabilities , I am not sure if I missed something https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.1/bk_security/content/copy-to-from-encr-zone.html
... View more
Labels:
- Labels:
-
Apache Ranger