Created 05-29-2018 07:39 PM
I am getting a strange issue with 3 out of 8 data nodes in our HDP 2.6.0 cluster. These 3 data nodes are not reporting the correct number of blocks and also not sending the block reports to name node on regular intervals.
Ambari reporting :
[Alert][datanode_storage] Unable to extract JSON from JMX response
Any suggestion what is wrong with our cluster?
Thanks in advance for your assistance.
Created 05-29-2018 09:50 PM
The JMX response typically indicates 3 things why the datanode was not accessible.
This message comes from "/usr/lib/python2.6/site-packages/ambari_agent/alerts/metric_alert.py" script and following is the logic:
if isinstance(self.metric_info, JmxMetric): jmx_property_values, http_code = self._load_jmx(alert_uri.is_ssl_enabled, host, port, self.metric_info) if not jmx_property_values and http_code in [200, 307]: collect_result = self.RESULT_UNKNOWN value_list.append('HTTP {0} response (metrics unavailable)'.format(str(http_code))) elif not jmx_property_values and http_code not in [200, 307]: raise Exception("[Alert][{0}] Unable to extract JSON from JMX response".format(self.get_name())) else: value_list.extend(jmx_property_values) check_value = self.metric_info.calculate(value_list) value_list.append(check_value)
Network i
MTU (Maximum Transmission Unit) is related to TCP/IP networking in Linux. It refers to the size (in bytes) of the largest datagram that a given layer of a communications protocol can pass at a time. It should be identical on all the nodes.MTU is set in /etc/sysconfig/network-scripts/ifcfg-ethx
You can see current MTU setting with ifconfig command under Linux:
$ netstat -i
check the second row or
$ ip link list
- Check the host file on those failing nodes
- Check if DNS server is having problems in name resolution.
Run TestDFSIO performance tests
yarn jar /usr/hdp/2.x.x.x.x/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-*tests.jar TestDFSIO -write -nrFiles 100 -fileSize 100 TestDFSIO Read Test hadoop jar
Iperf
Is a widely used tool for network performance measurement and tuning
See Typical HDP Cluster Network Configuration Best Practices
Datanode is down
Restart the datanode using Ambari or manually
Garbage Collection
Running the GCViewer
Enable GC logging for Datanode service
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote -Xms2048m -Xmx2048m -Dhadoop.security.logger=ERROR,DRFAS $HADOOP_DATANODE_OPTS"
-verbose:gc -XX:+PrintGCDetails -Xloggc:${HADOOP_LOG_DIR}/hadoop-hdfs-datanode-`date +'%Y%m%d%H%M'`.gclog -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=20
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote -Xms2048m -Xmx2048m -Dhadoop.security.logger=ERROR,DRFAS -verbose:gc -XX:+PrintGCDetails -Xloggc:${HADOOP_LOG_DIR}/hadoop-hdfs-datanode-`date +'%Y%m%d%H%M'`.gclog -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=20 $HADOOP_DATANODE_OPTS"
The log should give you a detailed info.
Hope that helps
Created 05-29-2018 09:50 PM
The JMX response typically indicates 3 things why the datanode was not accessible.
This message comes from "/usr/lib/python2.6/site-packages/ambari_agent/alerts/metric_alert.py" script and following is the logic:
if isinstance(self.metric_info, JmxMetric): jmx_property_values, http_code = self._load_jmx(alert_uri.is_ssl_enabled, host, port, self.metric_info) if not jmx_property_values and http_code in [200, 307]: collect_result = self.RESULT_UNKNOWN value_list.append('HTTP {0} response (metrics unavailable)'.format(str(http_code))) elif not jmx_property_values and http_code not in [200, 307]: raise Exception("[Alert][{0}] Unable to extract JSON from JMX response".format(self.get_name())) else: value_list.extend(jmx_property_values) check_value = self.metric_info.calculate(value_list) value_list.append(check_value)
Network i
MTU (Maximum Transmission Unit) is related to TCP/IP networking in Linux. It refers to the size (in bytes) of the largest datagram that a given layer of a communications protocol can pass at a time. It should be identical on all the nodes.MTU is set in /etc/sysconfig/network-scripts/ifcfg-ethx
You can see current MTU setting with ifconfig command under Linux:
$ netstat -i
check the second row or
$ ip link list
- Check the host file on those failing nodes
- Check if DNS server is having problems in name resolution.
Run TestDFSIO performance tests
yarn jar /usr/hdp/2.x.x.x.x/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-*tests.jar TestDFSIO -write -nrFiles 100 -fileSize 100 TestDFSIO Read Test hadoop jar
Iperf
Is a widely used tool for network performance measurement and tuning
See Typical HDP Cluster Network Configuration Best Practices
Datanode is down
Restart the datanode using Ambari or manually
Garbage Collection
Running the GCViewer
Enable GC logging for Datanode service
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote -Xms2048m -Xmx2048m -Dhadoop.security.logger=ERROR,DRFAS $HADOOP_DATANODE_OPTS"
-verbose:gc -XX:+PrintGCDetails -Xloggc:${HADOOP_LOG_DIR}/hadoop-hdfs-datanode-`date +'%Y%m%d%H%M'`.gclog -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=20
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote -Xms2048m -Xmx2048m -Dhadoop.security.logger=ERROR,DRFAS -verbose:gc -XX:+PrintGCDetails -Xloggc:${HADOOP_LOG_DIR}/hadoop-hdfs-datanode-`date +'%Y%m%d%H%M'`.gclog -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=20 $HADOOP_DATANODE_OPTS"
The log should give you a detailed info.
Hope that helps
Created 08-06-2020 07:18 AM
UNKNOWN | [AMBARI_METRICS] |
UNKNOWN | Metrics Collector - HBase Master CPU Utilization [Alert][ams_metrics_collector_hbase_master_cpu] Unable to extract JSON from JMX response |
Is this big issue? anything worry about it?
another alert is
OK | [AMBARI_METRICS] |
OK | Metrics Collector - HBase Master CPU Utilization 12 CPU, load 12.8% |
Please advice me I am waiting for you kind response.
Created 05-30-2018 06:50 PM
Thank you so much @Geoffrey Shelton Okot for assistance on this. I really appreciate it.
1. MTU setting is same for all our data nodes. I have verified it.
2. I have performed testdfsio test .Pls see the attachment for test results.
3. Enable GC debugging.my hadoop-env template looks like below.
export HADOOP_DATANODE_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=800m -XX:MaxNewSize=800m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms{{dtnode_heapsize}} -Xmx{{dtnode_heapsize}} -Dhadoop.security.logger=ERROR,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_DATANODE_OPTS} -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseParNewGC"
After enabling GC debugging & restarting name nodes and data nodes.Below alarm disappeared
Unable to extract JSON from JMX response error
But now ,I am getting below error now on problematic data node in hadoop-hdfs-datanode-.log
2018-05-30 19:53:32,985 WARN datanode.DataNode (BPServiceActor.java:offerService(673)) - IOException in offerService java.io.EOFException: End of File Exception between local host is: "datanodehost/"; destination host is: "Namenodehost":8020; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1558) at org.apache.hadoop.ipc.Client.call(Client.java:1498) at org.apache.hadoop.ipc.Client.call(Client.java:1398) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy15.blockReport(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:211) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:374) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:645) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:785) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.EOFExceptionat java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1119) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1014) 2018-05-30 19:53:33,100 INFO datanode.DataNode (DataXceiver.java:writeBlock(669)) - Receiving BP-1033621575--1507285615620:blk_1461467777_387788610 src: /:42658 dest: /:50010 2018-05-30 19:53:33,878 INFO datanode.DataNode (DataXceiver.java:writeBlock(669)) - Receiving BP-1033621575--1507285615620:blk_1461467782_387788615 src: /:43782 dest: /:50010 2018-05-30 19:53:36,197 INFO datanode.DataNode (DataXceiver.java:writeBlock(669)) - Receiving BP-1033621575--1507285615620:blk_1368137451_294431710 src: /:52176 dest: /:50010
GC.log
9239114K(31375360K), 0.0954324 secs] [Times: user=0.75 sys=0.00, real=0.10 secs] 2018-05-30T20:37:23.000+0200: 15180.545: [GC (Allocation Failure) 2018-05-30T20:37:23.000+0200: 15180.545: [ParNew: 733378K->81919K(737280K), 0.0994234 secs] 9892898K->9739137K(31375360K), 0.0996623 secs] [Times: user=0.78 sys=0.01, real=0.10 secs] 2018-05-30T20:37:29.962+0200: 15187.508: [GC (Allocation Failure) 2018-05-30T20:37:29.963+0200: 15187.508: [ParNew: 727808K->81689K(737280K), 0.1043798 secs] 10385026K->10379938K(31375360K), 0.1046235 secs] [Times: user=0.83 sys=0.00, real=0.11 secs] 2018-05-30T20:37:33.884+0200: 15191.430: [GC (Allocation Failure) 2018-05-30T20:37:33.885+0200: 15191.430: [ParNew: 733664K->81919K(737280K), 0.1201577 secs] 11031913K->10881691K(31375360K), 0.1203890 secs] [Times: user=0.95 sys=0.00, real=0.12 secs] 2018-05-30T20:37:41.029+0200: 15198.574: [GC (Allocation Failure) 2018-05-30T20:37:41.029+0200: 15198.575: [ParNew: 727734K->78326K(737280K), 0.1015139 secs] 11527506K->11522912K(31375360K), 0.1017500 secs] [Times: user=0.81 sys=0.00, real=0.10 secs] 2018-05-30T20:37:44.780+0200: 15202.325: [GC (Allocation Failure) 2018-05-30T20:37:44.780+0200: 15202.325: [ParNew: 730789K->81920K(737280K), 0.0937630 secs] 12175374K->12020024K(31375360K), 0.0939903 secs] [Times: user=0.74 sys=0.00, real=0.09 secs] 2018-05-30T20:37:51.818+0200: 15209.363: [GC (Allocation Failure) 2018-05-30T20:37:51.818+0200: 15209.363: [ParNew: 723037K->78409K(737280K), 0.1089323 secs] 12661141K->12638859K(31375360K), 0.1091735 secs] [Times: user=0.87 sys=0.01, real=0.11 secs] 2018-05-30T20:37:55.071+0200: 15212.616: [GC (Allocation Failure) 2018-05-30T20:37:55.071+0200: 15212.616: [ParNew: 733424K->81919K(737280K), 0.0912281 secs] 13293874K->13139143K(31375360K), 0.0914462 secs] [Times: user=0.72 sys=0.00, real=0.09 secs] 2018-05-30T20:38:02.582+0200: 15220.127: [GC (Allocation Failure) 2018-05-30T20:38:02.582+0200: 15220.127: [ParNew: 731000K->80436K(737280K), 0.1039197 secs] 13788224K->13781232K(31375360K), 0.1041447 secs] [Times: user=0.82 sys=0.00, real=0.10 secs] 2018-05-30T20:38:05.811+0200: 15223.356: [GC (Allocation Failure) 2018-05-30T20:38:05.811+0200: 15223.356: [ParNew: 734976K->81919K(737280K), 0.0843448 secs] 14435772K->14285826K(31375360K), 0.0845672 secs] [Times: user=0.67 sys=0.00, real=0.09 secs] 2018-05-30T20:38:13.249+0200: 15230.794: [GC (Allocation Failure) 2018-05-30T20:38:13.249+0200: 15230.794: [ParNew: 725770K->80833K(737280K), 0.0967994 secs] 14929677K->14924119K(31375360K), 0.0970191 secs] [Times: user=0.76 sys=0.00, real=0.10 secs] 2018-05-30T20:38:16.685+0200: 15234.231: [GC (Allocation Failure) 2018-05-30T20:38:16.686+0200: 15234.231: [ParNew: 735203K->81920K(737280K), 0.0984436 secs] 15578489K->15419615K(31375360K), 0.0986753 secs] [Times: user=0.78 sys=0.00, real=0.10 secs] 2018-05-30T20:38:24.385+0200: 15241.930: [GC (Allocation Failure) 2018-05-30T20:38:24.385+0200: 15241.930: [ParNew: 735008K->79750K(737280K), 0.0981608 secs] 16072704K->16066284K(31375360K), 0.0983850 secs] [Times: user=0.78 sys=0.00, real=0.09 secs] 2018-05-30T20:38:27.513+0200: 15245.058: [GC (Allocation Failure) 2018-05-30T20:38:27.513+0200: 15245.058: [ParNew: 731825K->81920K(737280K), 0.0928862 secs] 16718359K->16566812K(31375360K), 0.0931079 secs] [Times: user=0.73 sys=0.00, real=0.10 secs] 2018-05-30T20:38:35.118+0200: 15252.664: [GC (Allocation Failure) 2018-05-30T20:38:35.119+0200: 15252.664: [ParNew: 728589K->81823K(737280K), 0.1155139 secs] 17213482K->17208899K(31375360K), 0.1157287 secs] [Times: user=0.91 sys=0.01, real=0.11 secs] 2018-05-30T20:38:39.004+0200: 15256.549: [GC (Allocation Failure) 2018-05-30T20:38:39.004+0200: 15256.549: [ParNew: 735843K->81920K(737280K), 0.0939004 secs] 17862919K->17682067K(31375360K), 0.0941023 secs] [Times: user=0.74 sys=0.00, real=0.10 secs] 2018-05-30T20:38:46.888+0200: 15264.433: [GC (Allocation Failure) 2018-05-30T20:38:46.888+0200: 15264.433: [ParNew: 730708K->78583K(737280K), 0.0952740 secs] 18330855K->18343737K(31375360K), 0.0954785 secs] [Times: user=0.75 sys=0.01, real=0.09 secs]
Issue still persists with data nodes.3 out of 8 data nodes are reporting very less number of blocks
Please assist.
Created 05-30-2018 08:27 PM
There is definitely a network problem with the 3 nodes are they same hardware? NIC and network?
Average IO rate mb/sec: 27.063095092773438 Average IO rate mb/sec: 19.786481857299805
"Allocation Failure" is a cause of GC cycle to kick.
A GC allocation failure means that the garbage collector could not move objects from young gen to old gen fast enough because it does not have enough memory in old gen. This can cause application slowness.
Whats your DataNode maximum Java heap size?
https://community.hortonworks.com/questions/64677/datanode-heapsize-computation.html
https://community.hortonworks.com/questions/45381/do-i-need-to-tune-java-heap-size.html
https://community.hortonworks.com/questions/78981/data-node-heap-size-warning.html
Do you have a NameNode HA component? if so it may be that failover has occurred -but the client doesn't detect this and retry its operation.
If it isn't a production cluster can you restart all the components?
Created 05-31-2018 05:38 AM
Thank you! I really appreciate your time and efforts.
1. Data node heap size is 30 GB.My worry is that why only 3 nodes are giving the issue not others if something is wrong with configuration. what is should be ideal heap size for data nodes do you have any idea? I did not find any formula to calculate the heap size for data nodes.
2. We are using name node HA. I suspect that HA switch over might have caused this problem.I have restarted all the components.what should I check for if issue is caused by name node HA.?Name node heap size is 75 GB ..used 70%.
Created 05-31-2018 08:00 AM
Did you go through the links I posted above?
Your data node and Namenode heap sizes need some tuning.
Are you seeing any data node high HEAP SIZE alert?
Memory is estimated by considering the capacity of a cluster. Values are rounded. The below cluster physically stores 4800 TB, or approximately 36 million block files (at the default block size). Replication determines how many namespace blocks represent these block files.
At capacity, with the recommended allocation of 1 GB of memory per million blocks, The Cluster needs 12 GB of maximum heap space.
200 hosts of 24 TB each = 4800 TB.
At capacity, with the recommended allocation of 1 GB of memory per million blocks, This cluster will need 12 GB of maximum heap space.
Hope that helps please revert!
Created on 06-01-2018 06:03 AM - edited 08-17-2019 09:48 PM
Yes, I have been through the post mentioned by you. We had data nodes failure issues in past ,increase heap size fixed it but I will fine tune them. Below is heap utilization for data node (max heap 30 GB). High heap usage data nodes (marked in red) are the problematic ones.
Hadoop env
SHARED_HADOOP_NAMENODE_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{namenode_opt_newsize}} -XX:MaxNewSize={{namenode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms{{namenode_heapsize}} -Xmx{{namenode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT" export HADOOP_NAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node\" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 ${HADOOP_NAMENODE_OPTS}" export HADOOP_DATANODE_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=800m -XX:MaxNewSize=800m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms{{dtnode_heapsize}} -Xmx{{dtnode_heapsize}} -Dhadoop.security.logger=ERROR,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_DATANODE_OPTS} -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseParNewGC " export HADOOP_SECONDARYNAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-secondarynamenode/bin/kill-secondary-name-node\" ${HADOOP_SECONDARYNAMENODE_OPTS}"
You mentioned "A GC allocation failure means that the garbage collector could not move objects from young gen to old gen fast enough because it does not have enough memory in old gen.
which parameter holds values for old gen?
we have got 8 data nodes, CPU 2*8 ,memory 256 GB, Disk -12*6 =72 TB
8 hosts of 72 TB each = 576 TB.
But ambari is reporting 156,710872 blocks, am I missing something here?
Await for your response. Thank you so much!
Created 06-03-2018 11:06 PM
Have you configured your cluster for rack awareness?
HDFS block placement will use rack awareness for fault tolerance by placing one block replica on a different rack. This provides data availability in the event of a network switch failure or partition within the cluster.
You will need the help of your network/data center team to share the network topology and how the nodes are spread out in the racks.
You can use Ambari UI --> Hosts to set the rack topology after knowing the subnets and DC setup. To understand better see HDP rack awareness also see HCC rack-awareness-series-1 and HCC rack-awareness-series-2
Hope that helps
Created on 06-06-2018 01:00 PM - edited 08-17-2019 09:47 PM
@Geoffrey Shelton Okot ,Thank you so much for getting back to me.
We don't have rack awareness enabled on our DR cluster as it's 8 data nodes cluster only. we do have rack awareness in our production cluster.
We can enable rack awareness later but my first priority is to get back the blocks on data nodes as faulty data nodes are not sending any block report to name node. Here is current status as of today.
I am still getting the EOFException error on problematic data nodes other data nodes are not giving this error.
I checked with our network team & they said all the data nodes are connected to same NIC and there is no packet loss.
Hardware team found some correctable memory errors but nothing major.
Is there any maximum number of blocks retention limits for a particular data node? I meant that is there any possibility that max. number of blocks retention limit has been exceeded for problematic data nodes & because of that they stopped sending the block report to name node due to some capacity/resource constraints? Please guide.Do I need to report this as a bug to apache foundation?
java.io.EOFException: End of File Exception between local host is: "DATANODE HOST"; destination host is: "NAMENDOE HOST":8020; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1558) at org.apache.hadoop.ipc.Client.call(Client.java:1498) at org.apache.hadoop.ipc.Client.call(Client.java:1398) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy15.blockReport(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:211) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:374) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:645) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:785) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1119) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1014)