Member since
08-08-2017
1652
Posts
30
Kudos Received
11
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1994 | 06-15-2020 05:23 AM | |
| 16387 | 01-30-2020 08:04 PM | |
| 2143 | 07-07-2019 09:06 PM | |
| 8327 | 01-27-2018 10:17 PM | |
| 4721 | 12-31-2017 10:12 PM |
06-28-2021
06:12 AM
1 Kudo
we have HDP cluster with 2 resource manager services , and 190 node managers services HDP version - 2.6.5 YARN version - 2.7.3 Hadoop platform - ambari 2.6.2.1 version each node manager is located on VM linux machines now we want to extend the node-managers machines to 220 machines the Question that I want to ask: dose resource-manager can support 220 node managers services ( when each node-manager service installed on one node manager linux machine ) ? what is the max limit of node-mangers services that one resource -manager can support?
... View more
Labels:
- Labels:
-
Apache Ambari
06-28-2021
06:10 AM
we have Hadoop cluster ( HDP 2.6.5 cluster with ambari , with 25 datanodes & nodemanager machines )
we are using spark streaming application (spark 2.1 run over Hortonworks 2.6.x )
the current situation is that spark streaming applications runs on all datanodes & node-manager machines
but from the resources manager logs we see the following INFO warning:
2021-06-27 14:07:01,456 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0004 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10
2021-06-27 14:07:01,456 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0003 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10
2021-06-27 14:07:01,456 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0009 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10
regarding to Reservation Exceeds Allowed number of nodes
what is the customization that we should to do in order to avoid above those messages?
... View more
Labels:
06-27-2021
11:13 AM
we have 2 resource managers that are working as part of HDP cluster the first resource manager is failed after couple minutes from the log of the resource manager we can see the following lines that returned many times 2021-06-27 14:07:16,022 INFO scheduler.SchedulerNode (SchedulerNode.java:allocateContainer(152)) - Assigned container container_e83_1624802728037_0001_01_004355 of capacity <memory:86016, vCores:5> on host datanode23.fgtf.com:45454, which has 5 containers, <memory:199680, vCores:15> used and <memory:27708, vCores:75> available after allocation 2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0001 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10 2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0004 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10 2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0003 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10 2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: AND 2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0001 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10 2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0004 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10 2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0003 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10 2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: and 2021-06-27 14:07:01,282 INFO webproxy.WebAppProxyServlet (WebAppProxyServlet.java:doGet(382)) - dr.who is accessing unchecked http://22.20.101.13:53092/api/v1/applications/application_1624802728037_0009/executors which is the app master GUI of application_1624802728037_0009 owned by hdfs 2021-06-27 14:07:01,282 INFO webproxy.WebAppProxyServlet (WebAppProxyServlet.java:doGet(382)) - dr.who is accessing unchecked http://22.20.14.5:36198/api/v1/applications/application_1624801538018_0011/executors which is the app master GUI of application_1624801538018_0011 owned by hdfs any idea what is the meaning of the INFO about? INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0001 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10
... View more
Labels:
- Labels:
-
Apache Ambari
03-21-2021
10:37 PM
about the API Ambari command cli , can you show me the full syntax that replace the disable to enable
... View more
03-21-2021
08:30 AM
in Ambari we have the following feature that is disable , under yarn Configs what is the relevant ambari rest api in order to change the state of CPU Scheduling from disable to enable ?
... View more
Labels:
- Labels:
-
Apache Ambari
02-16-2021
06:47 AM
hi all we are trying to download the ambari version 2.6.1 but without success ( according to https://docs.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDF3/HDF-3.1.1/bk_installing-hdf-on-hdp-ppc/content/ambari_repositories.html ) wget http://public-repo-1.hortonworks.com/ambari/centos7-ppc/2.x/updates/2.6.1.0 --2021-02-16 14:44:08-- http://public-repo-1.hortonworks.com/ambari/centos7-ppc/2.x/updates/2.6.1.0 Resolving public-repo-1.hortonworks.com (public-repo-1.hortonworks.com)... 13.225.255.100, 13.225.255.128, 13.225.255.124, ... Connecting to public-repo-1.hortonworks.com (public-repo-1.hortonworks.com)|13.225.255.100|:80... connected. HTTP request sent, awaiting response... 403 Forbidden 2021-02-16 14:44:09 ERROR 403: Forbidden. any idea why this version cant be download? but we can download from other site as wget http://archive.apache.org/dist/ambari/ambari-2.6.1/apache-ambari-2.6.1-src.tar.gz
... View more
Labels:
- Labels:
-
Ambari Blueprints
02-11-2021
06:22 AM
can you described more about - "The rebalance (by Blockpool" we have HDP cluster with ambari so not sure what we need to do
... View more
01-20-2021
10:27 PM
We have ambari cluster , HDP version 2.6.5 Cluster include management of two name-node ( one is active and the secondary is standby ) And 65 datanode machines We have problem with the standby name-node that not started and from the namenode logs we Can see the following 2021-01-01 15:19:43,269 ERROR namenode.NameNode (NameNode.java:main(1783)) - Failed to start namenode. java.io.IOException: There appears to be a gap in the edit log. We expected txid 90247527115, but got txid 90247903412. from ambari we can see For now the active namenode is up but the standby name node is down , and the root cause for This issue is because **namenode matadata is damaged/corrupted.** So we have two solution - A or B A) run the following recover on standby namenode su hadoop namenode -recover B) Put Active NN in safemode su hdfs hdfs dfsadmin -safemode enter Do a savenamespace operation on Active NN su hdfs hdfs dfsadmin -saveNamespace Leave Safemode su hdfs hdfs dfsadmin -safemode leave Login to Standby NN Run below command on Standby namenode to get latest fsimage that we saved in above steps. su hdfs hdfs namenode -bootstrapStandby -force what is the preferred solution ( solution A or Solution B ) for our problem?
... View more
Labels:
- Labels:
-
HDFS
01-19-2021
11:07 AM
we have Hadoop cluster with 2 name-nodes ( active standby ) and 12 data nodes all 12 data-nodes machines have disks for HDFS we are before the action of `hadoop namenode -recover` , and that because we suspect about corrupted files as fsimage_0000000000001253918 or edits_0000000000001203337-0000000000001214475 etc so to recover the hdfs meta data we can do the following $ hadoop namenode -recover DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 21/01/19 17:56:35 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: user = hdfs STARTUP_MSG: host = master1.sys67.com/17.2.12.78 STARTUP_MSG: args = [-recover] STARTUP_MSG: version = 2.7.3.2.6.5.0-292 21/01/19 17:56:35 INFO namenode.NameNode: createNameNode [-recover] You have selected Metadata Recovery mode. This mode is intended to recover lost metadata on a corrupt filesystem. Metadata recovery mode often permanently deletes data from your HDFS filesystem. Please back up your edit log and fsimage before trying this! Are you ready to proceed? (Y/N) (Y or N) y the question is: dose this action could also affected the data itself on the data-nodes machines ? or only the meta data on namenode machines?
... View more
Labels:
- Labels:
-
HDFS
01-19-2021
09:10 AM
we have ambari cluster , HDP version `2.6.5` cluster include management of two name-node ( one is active and the secondary is standby ) and 65 datanode machines we have problem with the standby name-node that not started and from the namenode logs we can see the following 2021-01-01 15:19:43,269 ERROR namenode.NameNode (NameNode.java:main(1783)) - Failed to start namenode. java.io.IOException: There appears to be a gap in the edit log. We expected txid 90247527115, but got txid 90247903412. at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215) for now the active namenode is up but the standby name node is down regarding to java.io.IOException: There appears to be a gap in the edit log. We expected txid 90247527115, but got txid 90247903412. what is the preferred solution to fix this problem?
... View more
Labels:
- Labels:
-
HDFS