Member since
08-08-2017
1652
Posts
30
Kudos Received
11
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1942 | 06-15-2020 05:23 AM | |
| 15734 | 01-30-2020 08:04 PM | |
| 2087 | 07-07-2019 09:06 PM | |
| 8152 | 01-27-2018 10:17 PM | |
| 4627 | 12-31-2017 10:12 PM |
12-31-2018
06:05 PM
we also get this from the zookeper service check Welcome to ZooKeeper!
JLine support is enabled
[zk: zookeper_server.sys54.com:2181(CONNECTING) 0] ls /
Command failed after 1 tries
... View more
12-31-2018
05:23 PM
and after 10min we get that: netstat -nape | awk '{if($5 =="23.1.35.197:2181")print $4, $9;}'
23.1.35.197:34425 -
23.1.35.197:34416 -
23.1.35.197:34392 -
23.1.35.197:34389 -
23.1.35.197:34462 63468/java
23.1.35.197:34401 -
23.1.35.197:34358 -
23.1.35.197:34437 -
23.1.35.197:34361 -
23.1.35.197:34451 63468/java
23.1.35.197:34354 -
23.1.35.197:34360 -
23.1.35.197:34368 -
23.1.35.197:34444 -
23.1.35.197:34459 63468/java
23.1.35.197:34442 -
23.1.35.197:34391 -
23.1.35.197:34440 -
23.1.35.197:34452 63468/java
... View more
12-31-2018
05:11 PM
I get that ( after restart the zookeper service from ambari )
netstat -nape | awk '{if($5 =="23.1.35.197:2181")print $4, $9;}'
23.1.35.197:34065 -
23.1.35.197:34071 -
23.1.35.197:34053 -
23.1.35.197:34066 -
23.1.35.197:34068 -
23.1.35.197:34079 63468/java
23.1.35.197:34082 63468/java
23.1.35.197:34052 -
23.1.35.197:34063 -
23.1.35.197:34069 -
23.1.35.197:34075 63468/java
23.1.35.197:34084 63468/java
23.1.35.197:34061 -
23.1.35.197:34078 63468/java
<br>
... View more
12-31-2018
04:40 PM
What application is running at that particular time? This can be caused by a bug in user code check the offending application with nestat - need to verify Ensure that the configuration setting for the maximum number of client connections is sufficient enough to avoid the loss of connections. - OK update the value of the maxClientCnxns configuration parameter in the ZooKeeper-installation-directory/conf/zoo.cfg file on the zookeeper ensemble. - OK Ensure that you have no system issues with CPU services, memory, disk input/output, or other system resources. - OK Zookeeper is sensitive to NTPD functionality make sure the clock is synchronized in the ensemble. - OK Restart ZooKeepers through Ambari - NO need since we restart couple times and with the same results
... View more
12-31-2018
03:51 PM
we check all your comments and I not see a problem except step 1 , you said "
What application is running at that particular time? This can be caused by a bug in user code check the offending application with nestat" , can you please suggest how to verify ? ( what actualy we need to look on netstat output )
... View more
12-31-2018
01:51 PM
Environment:
HDP 2.6.4 Ambari – 2.6.1 3
zookeeper server hi all, In the first
zookeeper server it seems that even after closing the connection to zookeeper is
not getting closed, which
causes the maximum number of client connections to be reached from a host - we
have maxClientCnxns as 60 in zookeeper config As
a result when a new application comes and tries to create a connection it
fails. Example when Connections are: echo stat | nc 23.1.35.185 2181
Latency min/avg/max: 0/71/399
Received: 3031 Sent: 2407
Connections: 67
Outstanding: 622
Zxid: 0x130000004d
Mode: follower
Node count: 3730 But after some time when connection comes to ~70 we see echo stat | nc 23.1.35.185 2181
Ncat: Connection reset by peer. And We can see also many CLOSE_WAIT java 58936 zookeeper 60u IPv6 381963738 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44983 (CLOSE_WAIT)
java 58936 zookeeper 61u IPv6 381963798 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:45034 (CLOSE_WAIT)
java 58936 zookeeper 62u IPv6 381963667 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44956 (CLOSE_WAIT)
java 58936 zookeeper 63u IPv6 381949363 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44911 (CLOSE_WAIT)
java 58936 zookeeper 64u IPv6 381964358 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44957 (CLOSE_WAIT)
java 58936 zookeeper 65u IPv6 381963638 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44940 (CLOSE_WAIT)
java 58936 zookeeper 66u IPv6 381963640 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44943 (CLOSE_WAIT)
java 58936 zookeeper 67u IPv6 381963642 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44945 (CLOSE_WAIT)
From the zookeeper log 2018-12-26 02:50:46,382 [myid:1]
- WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193]
- Too many connections from /23.1.35.185 - max is 602018-12-26 02:50:46,429 [myid:1]
- WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many
connections from /23.1.35.197 - max is 602018-12-26 02:50:46,849 [myid:1]
- WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many
connections from /23.1.35.187 - max is 602018-12-26 02:50:47,645 [myid:1]
- WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193]
- Too many connections from /23.1.35.197 - max is 602018-12-26 02:50:47,845 [myid:1]
- WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many
connections from /23.1.35.185 - max is 602018-12-26 02:50:48,180 [myid:1]
- WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many
connections from /23.1.35.187 - max is 602018-12-26 02:50:49,035 [myid:1]
- WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193]
- Too many connections from /23.1.35.185 - max is 602018-12-26 02:50:49,375 [myid:1]
- WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many
connections from /23.1.35.187 - max is 60 In the ambari we can see also Connection failed: [Errno 104] Connection reset
by peer to zookeper_server.sys54.com.:2181 I must to say that this not happening on
zookeeper servers 2 and 3 so any hint why the connection are CLOSE_WAIT ? NOTE - if we increase the maxClientCnxns to 300 , its not help because after some time we get more the 300 connections and then we see from the log 2018-12-26 02:50:49,375 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many
connections from /23.1.35.187 - max is 60
... View more
Labels:
12-27-2018
10:30 AM
1 Kudo
thank you , but one of my question was about to change the YARN configuration by delete the - /var/hadoop/yarn/local , from YARN , so after that as you know YARN required restart and we need to do it but will little worry if restart will failed because this changes , or maybe it is safe to do this change , what is your opinion ?
... View more
12-27-2018
09:48 AM
1 Kudo
@Jagadeesan A S can you refer also to my question on the link - https://community.hortonworks.com/questions/232062/yarn-local-dirs-safety-moving-the-local-dir-to-dat.html
... View more
12-27-2018
09:44 AM
hi all We have ambari cluster with 134 DATANODE machines, in YARN --> CONFIG yarn.nodemanager.local-dirs configured as the following: /var/hadoop/yarn/local,/grid/sdb/hadoop/yarn/local,/grid/sdc/hadoop/yarn/local,/grid/sdd/hadoop/yarn/local,/grid/sde/hadoop/yarn/local,/grid/sdf/hadoop/yarn/local we want to remove the - /var/hadoop/yarn/local from the configuration , and that required also YARN restart and maybe other services restart we intend to do this action to avoid writing to the local disk ( /var ) since we have 143 data-node machines in our ambari cluster we worry about this action of removing the line /var/hadoop/yarn/local from yarn.nodemanager.local-dirs , or maybe we can do it safety ? we will happy to get hortonworks opinion we know that generally not a good idea to use /hadoop/yarn/local for yarn.nodemanager.log-dirs which are container logs. Typically, we prefer to direct only these logs to all the Data mount points (like /grid/sdb/hadoop/yarn/local ).
... View more
Labels:
12-27-2018
09:19 AM
hi all Regarding the data-node machines in our ambari cluster in YARN --> CONFIG , we configured that yarn.nodemanager.local-dirs TO /var/hadoop/yarn/local so as we can see blockmgr-XXX folders are created and take a lot space from the local disk ( /var ) is it safe to remove the old blockmgr-XXX folders? ( lets say from 1 day ago ) 9433484 /var/hadoop/yarn/local/usercache/airflow/appcache/application_1544689806134_0986/blockmgr-d8628834-137a-4a7b-a833-69a47a4a8d53
9319472 /var/hadoop/yarn/local/usercache/airflow/appcache/application_1544689806134_0986/blockmgr-67632cb1-d552-41b9-9944-31954964ad9c
8588556 /var/hadoop/yarn/local/usercache/airflow/appcache/application_1544689806134_0986/blockmgr-d159be18-ca81-4974-9248-e79ac5db7981
7622108 /var/hadoop/yarn/local/usercache/airflow/appcache/application_1544689806134_0986/blockmgr-766b5a33-8478-4102-9457-47535ba57ddc
... View more
Labels: