Member since
05-22-2018
40
Posts
0
Kudos Received
0
Solutions
09-12-2018
12:48 PM
No configuration changed when I started getting
Namenode Connectivity: IssueThis DataNode is not connected to one or more of its NameNode(s).
Web server status: The Cloudera Manager Agent is not able to communicate with this role's web server. Datanode is not connected to one or more of its Namenode. Also, I start getting web server status error that Cloudera agent is not getting a response from its web server role. This is what the log looks like: dwh-worker-4.c.abc-1225.internal ERROR September 12, 2018 5:33 PM DataNode dwh-worker-4.c.abc-1225.internal:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.31.10.74:44280 dst: /172.31.10.74:50010 java.io.IOException: Not ready to serve the block pool, BP-1423177047-172.31.4.192-1492091038346. at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:745) also, the data nodes are randomly exiting: dwh-worker-1.c.abc-1225.internal:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.31.10.74:49848 dst: /172.31.4.147:50010
java.io.IOException: Not ready to serve the block pool, BP-1423177047-172.31.4.192-1492091038346.
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
at java.lang.Thread.run(Thread.java:745)
... View more
Labels:
- Labels:
-
Apache Hadoop
09-04-2018
05:26 AM
Also, I can see that I have over 100 connections on the port as of now.Where can I take the limit off from for the allowed number of connections?
... View more
09-04-2018
05:19 AM
That's probably the case. Could you share how did you resolve the problem? I don't see any problem in HS2 Logs though. However, I see a jump in open connections of HiveServer2. Would be great if you could share how you resolved it/
... View more
09-01-2018
06:51 AM
We recently started using Tableau and allowed Tableau online to access our hive server. Now, from that point, for about 2 hours in morning 10 to 12, our hive queries fail with Connecting to jdbc:hive2://ip-xxx-xx-x-xxx.ap-south-1.compute.internal:10000/default Unknown HS2 problem when communicating with Thrift server. Error: Could not open client transport with JDBC Uri: jdbc:hive2://ip-xxx-xx-x-xxx.ap-south-1.compute.internal:10000/default: java.net.SocketException: Connection reset (state=08S01,code=0) No current connection Intercepting System.exit(2) Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.Hive2Main], exit code [2] If I try to connect manually, I can connect fine: beeline -u jdbc:hive2://xxx-xx-x-xxx:10000/ I then have to take off tableau whitelisted IP and after about 10 minutes, it comes back up. We do not have a lot of queries on Tableau. What could be the issue? I have taken off limit to accept connections from zookeeper just in case it had anything to do with it. Pointers?
... View more
Labels:
- Labels:
-
Apache Hive
06-11-2018
01:00 AM
Alright. But could the higher value of NameNode heap size could possible result in node manager exits? @Geoffrey Shelton Okot
... View more
06-10-2018
03:58 PM
NameNode heap size is 5GB. DataNode heap size is 2GB. JVM option for datanode:-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled . Java option for NameNode : -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled . @Geoffrey Shelton Okot
... View more
06-10-2018
03:55 PM
I had to create the myid file and data directory manually. Finally the service started on my host machines, however I am still getting: Bad :Canary test failed to create an ephemeral znode. I also had to change permissions of var/lib/zookeeper folder to zookeeper-user which was earlier set to root. I have a feeling this error as well is because of some permission issue. How do I fix this?
... View more
Labels:
- Labels:
-
Apache Zookeeper
06-10-2018
10:22 AM
About all node managers going down, On restarting node managers, I realized it was picking a lot of containers from yarn-nm-recovery so I got rid of that folder. Now, all my node managers are not down but still running into continuous exits and I seem to have no way to debug this. I have allocated 2GB heap space and I can see it does not need more than a GB. The only thing that I see could be a problem is number of java threads waiting . It's about 40-50 and also 50-60 threads running at a time
... View more
06-10-2018
09:10 AM
9 worker nodes. These only have HDFS and Node managers installed on them. These shutdowns are the result of continuous exits by Node manager, however, I am not able to understand why my node managers are running into continuous exits. Would be really great. These run into unexpected exists even when there are a handful of jobs running and it keep happening throughout the data. I have tried looking through the logs but not seeing any errors there. @Geoffrey Shelton Okot
... View more