About mike_bronson7

mike_bronson7 · ‎12-31-2018

we also get this from the zookeper service check Welcome to ZooKeeper! JLine support is enabled [zk: zookeper_server.sys54.com:2181(CONNECTING) 0] ls / Command failed after 1 tries

mike_bronson7 · ‎12-31-2018

and after 10min we get that: netstat -nape | awk '{if($5 =="23.1.35.197:2181")print $4, $9;}' 23.1.35.197:34425 - 23.1.35.197:34416 - 23.1.35.197:34392 - 23.1.35.197:34389 - 23.1.35.197:34462 63468/java 23.1.35.197:34401 - 23.1.35.197:34358 - 23.1.35.197:34437 - 23.1.35.197:34361 - 23.1.35.197:34451 63468/java 23.1.35.197:34354 - 23.1.35.197:34360 - 23.1.35.197:34368 - 23.1.35.197:34444 - 23.1.35.197:34459 63468/java 23.1.35.197:34442 - 23.1.35.197:34391 - 23.1.35.197:34440 - 23.1.35.197:34452 63468/java

mike_bronson7 · ‎12-31-2018

I get that ( after restart the zookeper service from ambari ) netstat -nape | awk '{if($5 =="23.1.35.197:2181")print $4, $9;}' 23.1.35.197:34065 - 23.1.35.197:34071 - 23.1.35.197:34053 - 23.1.35.197:34066 - 23.1.35.197:34068 - 23.1.35.197:34079 63468/java 23.1.35.197:34082 63468/java 23.1.35.197:34052 - 23.1.35.197:34063 - 23.1.35.197:34069 - 23.1.35.197:34075 63468/java 23.1.35.197:34084 63468/java 23.1.35.197:34061 - 23.1.35.197:34078 63468/java <br>

mike_bronson7 · ‎12-31-2018

What application is running at that particular time? This can be caused by a bug in user code check the offending application with nestat - need to verify Ensure that the configuration setting for the maximum number of client connections is sufficient enough to avoid the loss of connections. - OK update the value of the maxClientCnxns configuration parameter in the ZooKeeper-installation-directory/conf/zoo.cfg file on the zookeeper ensemble. - OK Ensure that you have no system issues with CPU services, memory, disk input/output, or other system resources. - OK Zookeeper is sensitive to NTPD functionality make sure the clock is synchronized in the ensemble. - OK Restart ZooKeepers through Ambari - NO need since we restart couple times and with the same results

mike_bronson7 · ‎12-31-2018

we check all your comments and I not see a problem except step 1 , you said " What application is running at that particular time? This can be caused by a bug in user code check the offending application with nestat" , can you please suggest how to verify ? ( what actualy we need to look on netstat output )

mike_bronson7 · ‎12-31-2018

Environment: HDP 2.6.4 Ambari – 2.6.1 3 zookeeper server hi all, In the first zookeeper server it seems that even after closing the connection to zookeeper is not getting closed, which causes the maximum number of client connections to be reached from a host - we have maxClientCnxns as 60 in zookeeper config As a result when a new application comes and tries to create a connection it fails. Example when Connections are: echo stat | nc 23.1.35.185 2181 Latency min/avg/max: 0/71/399 Received: 3031 Sent: 2407 Connections: 67 Outstanding: 622 Zxid: 0x130000004d Mode: follower Node count: 3730 But after some time when connection comes to ~70 we see echo stat | nc 23.1.35.185 2181 Ncat: Connection reset by peer. And We can see also many CLOSE_WAIT java 58936 zookeeper 60u IPv6 381963738 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44983 (CLOSE_WAIT) java 58936 zookeeper 61u IPv6 381963798 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:45034 (CLOSE_WAIT) java 58936 zookeeper 62u IPv6 381963667 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44956 (CLOSE_WAIT) java 58936 zookeeper 63u IPv6 381949363 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44911 (CLOSE_WAIT) java 58936 zookeeper 64u IPv6 381964358 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44957 (CLOSE_WAIT) java 58936 zookeeper 65u IPv6 381963638 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44940 (CLOSE_WAIT) java 58936 zookeeper 66u IPv6 381963640 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44943 (CLOSE_WAIT) java 58936 zookeeper 67u IPv6 381963642 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44945 (CLOSE_WAIT) From the zookeeper log 2018-12-26 02:50:46,382 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.185 - max is 602018-12-26 02:50:46,429 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.197 - max is 602018-12-26 02:50:46,849 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.187 - max is 602018-12-26 02:50:47,645 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.197 - max is 602018-12-26 02:50:47,845 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.185 - max is 602018-12-26 02:50:48,180 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.187 - max is 602018-12-26 02:50:49,035 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.185 - max is 602018-12-26 02:50:49,375 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.187 - max is 60 In the ambari we can see also Connection failed: [Errno 104] Connection reset by peer to zookeper_server.sys54.com.:2181 I must to say that this not happening on zookeeper servers 2 and 3 so any hint why the connection are CLOSE_WAIT ? NOTE - if we increase the maxClientCnxns to 300 , its not help because after some time we get more the 300 connections and then we see from the log 2018-12-26 02:50:49,375 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.187 - max is 60

mike_bronson7 · ‎12-27-2018

thank you , but one of my question was about to change the YARN configuration by delete the - /var/hadoop/yarn/local , from YARN , so after that as you know YARN required restart and we need to do it but will little worry if restart will failed because this changes , or maybe it is safe to do this change , what is your opinion ?

mike_bronson7 · ‎12-27-2018

@Jagadeesan A S can you refer also to my question on the link - https://community.hortonworks.com/questions/232062/yarn-local-dirs-safety-moving-the-local-dir-to-dat.html

mike_bronson7 · ‎12-27-2018

hi all We have ambari cluster with 134 DATANODE machines, in YARN --> CONFIG yarn.nodemanager.local-dirs configured as the following: /var/hadoop/yarn/local,/grid/sdb/hadoop/yarn/local,/grid/sdc/hadoop/yarn/local,/grid/sdd/hadoop/yarn/local,/grid/sde/hadoop/yarn/local,/grid/sdf/hadoop/yarn/local we want to remove the - /var/hadoop/yarn/local from the configuration , and that required also YARN restart and maybe other services restart we intend to do this action to avoid writing to the local disk ( /var ) since we have 143 data-node machines in our ambari cluster we worry about this action of removing the line /var/hadoop/yarn/local from yarn.nodemanager.local-dirs , or maybe we can do it safety ? we will happy to get hortonworks opinion we know that generally not a good idea to use /hadoop/yarn/local for yarn.nodemanager.log-dirs which are container logs. Typically, we prefer to direct only these logs to all the Data mount points (like /grid/sdb/hadoop/yarn/local ).

mike_bronson7 · ‎12-27-2018

hi all Regarding the data-node machines in our ambari cluster in YARN --> CONFIG , we configured that yarn.nodemanager.local-dirs TO /var/hadoop/yarn/local so as we can see blockmgr-XXX folders are created and take a lot space from the local disk ( /var ) is it safe to remove the old blockmgr-XXX folders? ( lets say from 1 day ago ) 9433484 /var/hadoop/yarn/local/usercache/airflow/appcache/application_1544689806134_0986/blockmgr-d8628834-137a-4a7b-a833-69a47a4a8d53 9319472 /var/hadoop/yarn/local/usercache/airflow/appcache/application_1544689806134_0986/blockmgr-67632cb1-d552-41b9-9944-31954964ad9c 8588556 /var/hadoop/yarn/local/usercache/airflow/appcache/application_1544689806134_0986/blockmgr-d159be18-ca81-4974-9248-e79ac5db7981 7622108 /var/hadoop/yarn/local/usercache/airflow/appcache/application_1544689806134_0986/blockmgr-766b5a33-8478-4102-9457-47535ba57ddc

Online	Offline
Last Visited	‎08-27-2024 09:17 AM

Member Since	‎08-08-2017 09:40 AM
Last Visited	‎08-27-2024 09:17 AM
Posts	1,652
Kudos received	29

Cloudera Community

Re: how to find number of CPU core on datanode ma...

Re: postgresql + ambari server failed to open port...

Re: how to stop the thrift servers by REST API

Re: namenode is in safe mode

Re: Directory /grid/sdg/hadoop/hdfs/data became un...

Re: Too many connections on zookeper server

Re: Too many connections on zookeper server

Re: Too many connections on zookeper server

Re: Too many connections on zookeper server

Re: Too many connections on zookeper server

Too many connections on zookeper server

Re: yarn local dirs + safety moving the local dir ...

Re: YARN LOCAL DIRS + deletion

yarn local dirs + safety moving the local dir to D...

YARN LOCAL DIRS + deletion