About teamwebocity

teamwebocity · ‎09-12-2018

No configuration changed when I started getting Namenode Connectivity: IssueThis DataNode is not connected to one or more of its NameNode(s). Web server status: The Cloudera Manager Agent is not able to communicate with this role's web server. Datanode is not connected to one or more of its Namenode. Also, I start getting web server status error that Cloudera agent is not getting a response from its web server role. This is what the log looks like: dwh-worker-4.c.abc-1225.internal ERROR September 12, 2018 5:33 PM DataNode dwh-worker-4.c.abc-1225.internal:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.31.10.74:44280 dst: /172.31.10.74:50010 java.io.IOException: Not ready to serve the block pool, BP-1423177047-172.31.4.192-1492091038346. at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:745) also, the data nodes are randomly exiting: dwh-worker-1.c.abc-1225.internal:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.31.10.74:49848 dst: /172.31.4.147:50010 java.io.IOException: Not ready to serve the block pool, BP-1423177047-172.31.4.192-1492091038346. at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:745)

teamwebocity · ‎09-04-2018

Also, I can see that I have over 100 connections on the port as of now.Where can I take the limit off from for the allowed number of connections?

teamwebocity · ‎09-04-2018

That's probably the case. Could you share how did you resolve the problem? I don't see any problem in HS2 Logs though. However, I see a jump in open connections of HiveServer2. Would be great if you could share how you resolved it/

teamwebocity · ‎09-01-2018

We recently started using Tableau and allowed Tableau online to access our hive server. Now, from that point, for about 2 hours in morning 10 to 12, our hive queries fail with Connecting to jdbc:hive2://ip-xxx-xx-x-xxx.ap-south-1.compute.internal:10000/default Unknown HS2 problem when communicating with Thrift server. Error: Could not open client transport with JDBC Uri: jdbc:hive2://ip-xxx-xx-x-xxx.ap-south-1.compute.internal:10000/default: java.net.SocketException: Connection reset (state=08S01,code=0) No current connection Intercepting System.exit(2) Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.Hive2Main], exit code [2] If I try to connect manually, I can connect fine: beeline -u jdbc:hive2://xxx-xx-x-xxx:10000/ I then have to take off tableau whitelisted IP and after about 10 minutes, it comes back up. We do not have a lot of queries on Tableau. What could be the issue? I have taken off limit to accept connections from zookeeper just in case it had anything to do with it. Pointers?

teamwebocity · ‎06-10-2018

I had to create the myid file and data directory manually. Finally the service started on my host machines, however I am still getting: Bad :Canary test failed to create an ephemeral znode. I also had to change permissions of var/lib/zookeeper folder to zookeeper-user which was earlier set to root. I have a feeling this error as well is because of some permission issue. How do I fix this?

teamwebocity · ‎05-28-2018

I am not using spark. Both hive and sqoop jobs were getting killed. I increased the number of attempts to 5 and sqoop jobs are fine now but hive jobs are still getting stuck. Also, now, instead of 137 error, all my node managers are running into unexpected exit error. I can see about 181 timed waiting threads in resource manager but JVM heap memory usage seems fine.

teamwebocity · ‎05-25-2018

@Harald Berghoff: Thank you for your response. I feel like I am in deep shit and really really need some help here. I have checked dmesg and it has not recorded any killed processed. We have all our jobs scheduled through oozie and we heavily depend on scheduled jobs. RAM on worker nodes, right? My worker nodes have 64 GB RAM and I can see free memory on nodes. From Resource Manager, I can see vCores getting used up before memory. Cluster has 225 GB memory and 54 VCores. For hosts I am using m4.4x instance. I can share my yarn configuration if you would like. Is there a way I can get some professional help here? I am okay with paid support for the issue.

teamwebocity · ‎05-25-2018

Container exited with a non-zero exit code 137 Killed by external signal This error randomly kills hive and sqoop jobs. Is there anyone here who is willing to support? Been trying to get an answer but no luck. Talking about checking logs, I have checked container logs and resource manager logs and service-specific logs, there is really nothing that points out why would this error be happening. I am using m4.4x large instances from AWS yarn.nodemanager.resource.memory-mb: 50 GIB Java Heap Size of ResourceManager in Bytes: 2GB yarn.scheduler.maximum-allocation-mb : 25GB Java Heap Size of NodeManager in Bytes: 2gb yarn.nodemanager.resource.cpu-vcores: 14 yarn.scheduler.maximum-allocation-vcores : 8 yarn.nodemanager.resource.cpu-vcores and yarn.scheduler.maximum-allocation-vcores values are different because I have node manager groups and some of the instances are m4.2x large which have 8 cpus for node manager. there fore I have taken the minimum of two for yarn.scheduler.maximum-allocation-vcores : 8 Please suggest ifthere is something off in my configuration. This error happens randomly even when there are not a lot of jobs running.

Online	Offline
Last Visited	‎11-23-2018 04:55 PM

Member Since	‎05-22-2018 01:49 PM
Last Visited	‎11-23-2018 04:55 PM
Posts	40

Cloudera Community

This DataNode is not connected to one or more of i...

Re: Hive server fails to connect

Re: Hive server fails to connect

Hive server fails to connect

Bad : Canary test failed to create an ephemeral zn...

Re: container exit code 137

Re: container exit code 137

container exit code 137