Member since
05-22-2018
40
Posts
0
Kudos Received
0
Solutions
09-12-2018
12:48 PM
No configuration changed when I started getting
Namenode Connectivity: IssueThis DataNode is not connected to one or more of its NameNode(s).
Web server status: The Cloudera Manager Agent is not able to communicate with this role's web server. Datanode is not connected to one or more of its Namenode. Also, I start getting web server status error that Cloudera agent is not getting a response from its web server role. This is what the log looks like: dwh-worker-4.c.abc-1225.internal ERROR September 12, 2018 5:33 PM DataNode dwh-worker-4.c.abc-1225.internal:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.31.10.74:44280 dst: /172.31.10.74:50010 java.io.IOException: Not ready to serve the block pool, BP-1423177047-172.31.4.192-1492091038346. at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:745) also, the data nodes are randomly exiting: dwh-worker-1.c.abc-1225.internal:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.31.10.74:49848 dst: /172.31.4.147:50010
java.io.IOException: Not ready to serve the block pool, BP-1423177047-172.31.4.192-1492091038346.
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
at java.lang.Thread.run(Thread.java:745)
... View more
Labels:
- Labels:
-
Apache Hadoop
09-04-2018
05:26 AM
Also, I can see that I have over 100 connections on the port as of now.Where can I take the limit off from for the allowed number of connections?
... View more
09-04-2018
05:19 AM
That's probably the case. Could you share how did you resolve the problem? I don't see any problem in HS2 Logs though. However, I see a jump in open connections of HiveServer2. Would be great if you could share how you resolved it/
... View more
09-01-2018
06:51 AM
We recently started using Tableau and allowed Tableau online to access our hive server. Now, from that point, for about 2 hours in morning 10 to 12, our hive queries fail with Connecting to jdbc:hive2://ip-xxx-xx-x-xxx.ap-south-1.compute.internal:10000/default Unknown HS2 problem when communicating with Thrift server. Error: Could not open client transport with JDBC Uri: jdbc:hive2://ip-xxx-xx-x-xxx.ap-south-1.compute.internal:10000/default: java.net.SocketException: Connection reset (state=08S01,code=0) No current connection Intercepting System.exit(2) Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.Hive2Main], exit code [2] If I try to connect manually, I can connect fine: beeline -u jdbc:hive2://xxx-xx-x-xxx:10000/ I then have to take off tableau whitelisted IP and after about 10 minutes, it comes back up. We do not have a lot of queries on Tableau. What could be the issue? I have taken off limit to accept connections from zookeeper just in case it had anything to do with it. Pointers?
... View more
Labels:
- Labels:
-
Apache Hive
06-10-2018
03:55 PM
I had to create the myid file and data directory manually. Finally the service started on my host machines, however I am still getting: Bad :Canary test failed to create an ephemeral znode. I also had to change permissions of var/lib/zookeeper folder to zookeeper-user which was earlier set to root. I have a feeling this error as well is because of some permission issue. How do I fix this?
... View more
Labels:
- Labels:
-
Apache Zookeeper
05-28-2018
06:25 AM
I am not using spark. Both hive and sqoop jobs were getting killed. I increased the number of attempts to 5 and sqoop jobs are fine now but hive jobs are still getting stuck. Also, now, instead of 137 error, all my node managers are running into unexpected exit error. I can see about 181 timed waiting threads in resource manager but JVM heap memory usage seems fine.
... View more
05-25-2018
01:22 PM
@Harald Berghoff: Thank you for your response. I feel like I am in deep shit and really really need some help here. I have checked dmesg and it has not recorded any killed processed. We have all our jobs scheduled through oozie and we heavily depend on scheduled jobs. RAM on worker nodes, right? My worker nodes have 64 GB RAM and I can see free memory on nodes. From Resource Manager, I can see vCores getting used up before memory. Cluster has 225 GB memory and 54 VCores. For hosts I am using m4.4x instance. I can share my yarn configuration if you would like. Is there a way I can get some professional help here? I am okay with paid support for the issue.
... View more
05-25-2018
12:53 AM
Container exited with a non-zero exit code 137 Killed by external signal
This error randomly kills hive and sqoop jobs. Is there anyone here who is willing to support? Been trying to get an answer but no luck. Talking about checking logs, I have checked container logs and resource manager logs and service-specific logs, there is really nothing that points out why would this error be happening. I am using m4.4x large instances from AWS yarn.nodemanager.resource.memory-mb: 50 GIB Java Heap Size of ResourceManager in Bytes: 2GB yarn.scheduler.maximum-allocation-mb : 25GB Java Heap Size of NodeManager in Bytes: 2gb yarn.nodemanager.resource.cpu-vcores: 14 yarn.scheduler.maximum-allocation-vcores : 8 yarn.nodemanager.resource.cpu-vcores and yarn.scheduler.maximum-allocation-vcores values are different because I have node manager groups and some of the instances are m4.2x large which have 8 cpus for node manager. there fore I have taken the minimum of two for yarn.scheduler.maximum-allocation-vcores : 8 Please suggest ifthere is something off in my configuration. This error happens randomly even when there are not a lot of jobs running.
... View more
Labels:
- Labels:
-
Apache YARN