Member since
04-14-2020
4035
Posts
4
Kudos Received
4
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1678 | 09-22-2020 12:19 AM | |
| 3812 | 07-07-2020 04:56 AM | |
| 2641 | 05-15-2020 12:20 AM | |
| 18376 | 05-14-2020 04:29 AM |
08-04-2021
01:56 AM
Hi @iamfromsky , Thank you for reaching out to our community! The error message which you have provided is logged when either a "broken pipe" or "connection reset" happens, which is most likely network-related. Please check if the network is stable when you see these errors. Also refer Jira HDFS-8814 for more details.
... View more
09-22-2020
12:19 AM
Hello @Mondi , When you install CDP Trial Version, it includes an embedded PostgreSQL database and is not suitable for a production environment. Please check this information for more details. Also, find how to end the trial or upgrade trial version and Managing licenses
... View more
07-07-2020
04:56 AM
2 Kudos
Hi @shrikant_bm , When ever Active NameNode server goes down, its associated daemon also goes down. HA works in the same way whenever Active NameNode daemon or server goes down. ZKFC will not receive the heartbeat and the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered. To answer your question: yes, in both the cases mentioned by you, HA should work.
... View more
07-06-2020
03:34 AM
Hi @shrikant_bm , Please find answers inline. 1. When active namenode server is rebooted will the standby namenode will not become active? Is this something expected? Or Did the HA did not work in our cluster Yes, standby NameNode will become active when primary NameNode reboots, provided high availability is enabled and configured. 2. Whether the HA is expected to work only between the active and standby namenode daemons? Yes, HA works only between active and standby NameNode. It is taken care by ZKFC (ZooKeeper Failover Controller).
... View more
07-06-2020
12:42 AM
Hi @shrikant_bm , Thank you for reaching out to community! If NameNode high availability is enabled and configured on your cluster. Automatic failover of active NameNode should work. [1] Give us steps on configuring NN high availability using Ambari. https://docs.cloudera.com/HDPDocuments/Ambari-2.7.5.0/managing-high-availability/content/amb_enable_namenode_high_availability.html [2] Gives us steps on Managing High Availability of Services for other components. https://docs.cloudera.com/HDPDocuments/Ambari-2.7.5.0/managing-high-availability/content/amb_managing_high_availability_of_services.html
... View more
05-15-2020
12:20 AM
1 Kudo
Hello @cyborg , Thank you for reaching out to Community! There are two ways to place a node in maintenance mode. 1) Select the host --> Select Actions for Selected > Begin Maintenance (Suppress Alerts/Decommission). The Begin Maintenance (Suppress Alerts/Decommission) dialog box opens. The role instances running on the hosts display at the top.Deselect the Decommission Host(s) option and Click Begin Maintenance. The Host Decommission Command dialog box opens and displays the progress of the command. To Exit Maintenance : Select the host --> Select Actions for Selected > End Maintenance > Deselect the Recommission Host(s) option and Click End Maintenance. This will re-enable alerts for the host. By using first option, It does not prevent events from being logged; it only suppresses the alerts that those events would otherwise generate. You can see a history of all the events that were recorded for entities during the period that those entities were in maintenance mode.This can be useful when you need to take actions in your cluster (make configuration changes and restart various elements) and do not want to see the alerts that will be generated due to those actions. For more details, refer https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cm_mc_maint_mode.html#cmug_topic_14_1 2) Select the host --> Select Actions for Selected > Begin Maintenance (Suppress Alerts/Decommission). The Begin Maintenance (Suppress Alerts/Decommission) dialog box opens. The role instances running on the hosts display at the top > Select Decommission Host(s). If the selected host is DataNode role, you can specify whether or not to replicate under-replicated data blocks to other DataNodes to maintain the cluster's replication factor. If the host is not running a DataNode role, you will only see the Decommission Host(s) option and Click Begin Maintenance.The Host Decommission Command dialog box opens and displays the progress of the command. To Exit Maintenance : Select the host --> Select Actions for Selected> Select Recommission Host(s). > choose to bring hosts online and start all roles or choose to bring hosts online and start roles later > Click End Maintenance. By using the second option, You can perform minor maintenance on cluster hosts such as adding memory or changing network cards or cables where the maintenance window is expected. In your case: you can suppress alerts, follow the 1st path that you described in the question (for taking down single node for few hours, when no under-replicated factor and your replication factor is more than 1)
... View more
05-14-2020
04:29 AM
Hi @Amn_468 , Thank you for replying back. Kindly try Increasing spark.rpc.askTimeout from default 120 seconds to a higher value in Ambari UI -> Spark Configs -> spark2-defaults. Recommendation is to increase it to at least 480 seconds and restart the necessary services.possibly the Driver and Executer are not able to get Heartbeat response in configured timeout. If you don’t want to do any cluster level change then you may try overriding this value in the job level. For example: spark-submit by adding --conf spark.rpc.askTimeout=600s while submitting the job
... View more
05-13-2020
11:40 PM
Hello @Amn_468 , To better assist you with this issue, it would be great if you could please help to provide the following additional information: 1) Is this issue occurring for all jobs or only some jobs? If the issue has only started recently, does this coincide with any code or configuration changes in the job itself or configuration changes in the cluster?
... View more