Member since
08-08-2017
1652
Posts
30
Kudos Received
11
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2186 | 06-15-2020 05:23 AM | |
| 18833 | 01-30-2020 08:04 PM | |
| 2345 | 07-07-2019 09:06 PM |
12-23-2022
11:10 AM
I have the same problem, but in my case JVM pause detections are happening every 15 mins and detecting between 28337ms, .. 853466ms pauses.
... View more
12-09-2022
05:24 AM
2 Kudos
The following areas normally cause this problem: 1) the connection from Ambari agent host to Ambari Server got lost. 2) firewall issue blocked connections. 3) hostname and IP address are not being set correctly in /etc/hosts You can compare the output using these APIs: > curl u user:paswd http://AmbariHost:8080/api/v1/hosts
... View more
10-21-2022
01:31 PM
Hi @mike_bronson7, Please follow this thread for the approach to handle such situations. https://community.cloudera.com/t5/Support-Questions/reinstall-ambari-or-add-existing-HDP-cluster-in-the-new/m-p/193982/highlight/true#M156042 Thank you.
... View more
01-24-2022
10:06 AM
1 Kudo
Hi @mike_bronson7 1. Do you see anything interesting from the broker 1010 log file? this is to try to understand why 1010 is not able to register in zookeeper. 2. Try forcing a new controller by using: [zk: localhost:2181(CONNECTED) 11] rmr /controller 3. Are these broker ids unique? if you describe other topics, do you see the same brokers ids and same behavior (leader none for some partitions)? 4. Finally, if this is dev env: 4.1 You can enable unclean leader election = true and restart the brokers Or: 4.2 (if this happening just for this topic) remove __consumer_offsets topic (just from zookeeper) and restart kafka
... View more
06-30-2021
08:48 AM
@mike_bronson7 Here you go how to determine YARN and MapReduce Memory Configuration Settings Happy hadooping
... View more
02-08-2021
02:25 PM
1 Kudo
Out of all the options available to deal with this situation, I think resetting your network configurations is the best. Resetting your network configurations is one of the maintenance procedures that help refresh or repair your network connectivity. In this way, it could eliminate latency and will return your network status like when you started using the Internet. To resolve your concern, we suggest that you reset your TCP/IP or Internet Protocol to its default settings. It's like using the services of Mortgage Advisor London when looking to buy a house, in the sense that it is the best choice you can make.
... View more
01-26-2021
03:16 PM
1 Kudo
@mike_bronson7 It seems to me like this is a symptom of having the default replication set to 3. This is for redundancy and processing capability within HDFS. It is recommended to have minimum 3 data nodes in the cluster to accommodate 3 healthy replicas of a block (as we have a default replication of 3). HDFS will not write replicas of the same blocks to the same data node. In your scenario there will be under replicated blocks and 1 healthy replica will be placed on the available data node. You may run setrep [1] to change the replication factor. If you provide a path to a directory then the command recursively changes the replication factor of all files under the directory tree rooted at path. hdfs dfs -setrep -w 1 /user/hadoop/dir1 [1] https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#setrep
... View more
10-10-2020
11:35 AM
1 Kudo
@mike_bronson7 Always stick to the Cloudera documentation. Yes !!! there is no risk in running that command I can understand your reservation.
... View more
09-13-2020
09:08 AM
thank you for the post but another question - according to the document - https://docs.cloudera.com/HDPDocuments/Ambari-2.7.0.0/administering-ambari/content/amb_changing_host_names.html The last stage is talking about – in case NameNode HA enabled , then need to run the following command on one of the name node hdfs zkfc -formatZK -force thank you for the post but since we have active name node and standby name node we assume that our namenode is HA enable example from our cluster but we want to understand what are the risks when doing the following cli on one of the namenode hdfs zkfc -formatZK -force is the below command is safety to run without risks ?
... View more
08-14-2020
12:38 AM
1 Kudo
@mike_bronson7 Let me try to answer all your 3 questions in a shot [snapshot] Zookeeper has 2 types of logs the snapshot and transactional log files. As changes are made to the znodes i.e addition or deletion of znodes these changes are appended to a transaction log, occasionally, when a log grows large, a snapshot of the current state of all znodes will be written to the filesystem. This snapshot supersedes all previous logs. To put you in context it's like the edit-logs and the fsimage in Namenode architecture, all changes made in the HDFS is logged in the edits-logs in secondary Namenode when a checkpoint kick in it merges the edits log with the old fsimage to incorporate the changes ever since the last checkpoint. So zk snapshot is synonym to the fsimage as it contains the current state of the znode entries and ACL's Snapshot policy In the earlier command shared the snapshot count parameter -n <count> if you really want to have sleep then you can increment it to 5 or 7 but I think 3 suffice to use the autopurge feature so I keep only 3 snapshots and 3 transaction logs. When enabled, ZooKeeper auto-purge feature retains the autopurge.snapRetainCount most recent snapshots and the corresponding transaction logs in the dataDir and dataLogDir respectively and deletes the rest. Defaults to 3. The minimum value is 3. Corrupt snapshots The Zookeeper might not be able to read its database and fail to come up because of some file corruption in the transaction logs of the ZooKeeper server. You will see some IOException on loading the ZooKeeper database. In such a case, make sure all the other servers in your ensemble are up and working. Use the 4 letters command "stat" command on the command port to see if they are in good health. After you have verified that all the other servers of the ensemble are up, you can go ahead and clean the database of the corrupt server. Solution Delete all the files in datadir/version-2 and datalogdir/version-2/. Restart the server. Hope that helps
... View more