Member since
07-25-2016
40
Posts
5
Kudos Received
0
Solutions
08-29-2017
07:17 AM
0
down vote
favorite
iam using hadoop apache 2.7.1 on centos 7
in HA cluster that consists of two namenodes and 6 data nodes
and i realized the following error that always found in my log
DataXceiver error processing WRITE_BLOCK operation src: /172.16.1.153:38360 dst: /172.16.1.153:50010
java.io.IOException: Connection reset by peer
so i updated the following propetey in hdfs-site.xml
in order to increase the number of available threads in all datanodes <property>
<name>dfs.datanode.max.transfer.threads</name>
<value>16000</value>
</property>
and i increased the number of open files too by editing baschrc ulimit -n 16384
but i still getting this error in my data nodes logs so while sending write requests to the cluster i issued the following command on data nodes to know number of threads cat /proc/processid/status
and they never exceeds 100 thread and in in order to know open files number issued
sysctl fs.file-nr and they never exceeds 300 open files so why iam always getting this error in data node logs and what is it's effect on performance
... View more
Labels:
- Labels:
-
Apache Hadoop
08-22-2017
01:36 PM
iam using hadoop apache 2.7.1 and it's high available and when issuing the command hdfs dfs -put file1 /hadoophome/ iam not able to put my file with following log in one of the available data nodes i have 6 data nodes and the replication factor is 3 2017-08-22 15:01:07,351 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-840293587-192.168.25.22-1499689510217:blk_1074111953_371166, type=HAS_DOWNSTREAM_IN_PIPELINE terminating 2017-08-22 15:01:07,938 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-840293587-192.168.25.22-1499689510217:blk_1074111949_371162, type=HAS_DOWNSTREAM_IN_PIPELINE java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2280) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1237) at java.lang.Thread.run(Thread.java:748) 2017-08-22 15:01:08,984 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-840293587-192.168.25.22-1499689510217:blk_1074111954_371167 src: /192.168.25.8:35957 dest: /192.168.25.2:50010 2017-08-22 15:01:09,146 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.25.8:35957, dest: /192.168.25.2:50010, bytes: 82, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1028946205_23, offset: 0, srvID: 075f3a14-9b13-404d-8ba8-4066212655d7, blockid: BP-840293587-192.168.25.22-1499689510217:blk_1074111954_371167, duration: 3404900 2017-08-22 15:01:09,146 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-840293587-192.168.25.22-1499689510217:blk_1074111954_371167, type=HAS_DOWNSTREAM_IN_PIPELINE terminating 2017-08-22 15:01:09,280 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver.run(): java.net.SocketTimeoutException: 300 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/192.168.25.2:50010 remote=/192.168.25.2:48085] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at java.io.DataOutputStream.flush(DataOutputStream.java:123) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1473) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1410) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1323) at java.lang.Thread.run(Thread.java:748) 2017-08-22 15:01:09,281 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-840293587-192.168.25.22-1499689510217:blk_1074111949_371162, type=HAS_DOWNSTREAM_IN_PIPELINE java.net.SocketTimeoutException: 300 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/192.168.25.2:50010 remote=/192.168.25.2:48085] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at java.io.DataOutputStream.flush(DataOutputStream.java:123) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1473) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1410) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1323) at java.lang.Thread.run(Thread.java:748) 2017-08-22 15:01:09,281 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-840293587-192.168.25.22-1499689510217:blk_1074111949_371162, type=HAS_DOWNSTREAM_IN_PIPELINE terminating 2017-08-22 15:01:10,808 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:3173ms (threshold=300ms) 2017-08-22 15:01:10,808 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-840293587-192.168.25.22-1499689510217:blk_1074111949_371162 java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:417) at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251) at java.lang.Thread.run(Thread.java:748) 2017-08-22 15:01:10,809 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-840293587-192.168.25.22-1499689510217:blk_1074111949_371162 received exception java.nio.channels.ClosedByInterruptException 2017-08-22 15:01:10,809 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: dn2:50010:DataXceiver error processing WRITE_BLOCK operation src: /192.168.25.2:48085 dst: /192.168.25.2:50010 java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:417) at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251) at java.lang.Thread.run(Thread.java:748) 2017-08-22 15:01:11,314 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-840293587-192.168.25.22-1499689510217:blk_1074111949_371162 src: /192.168.25.2:48091 dest: /192.168.25.2:50010 2017-08-22 15:01:11,314 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover RBW replica BP-840293587-192.168.25.22-1499689510217:blk_1074111949_371162 2017-08-22 15:01:11,314 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering ReplicaBeingWritten, blk_1074111949_371162, RBW getNumBytes() = 53932032 getBytesOnDisk() = 53932032 getVisibleLength()= 53932032 getVolume() = /hdd/data_dir/current getBlockFile() = /hdd/data_dir/current/BP-840293587-192.168.25.22-1499689510217/current/rbw/blk_1074111949 bytesAcked=53932032 bytesOnDisk=53932032 2017-08-22 15:01:11,619 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-840293587-192.168.25.22-1499689510217:blk_1074111949_371169, type=HAS_DOWNSTREAM_IN_PIPELINE java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2280) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1237) at java.lang.Thread.run(Thread.java:748) 2017-08-22 15:01:11,630 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-840293587-192.168.25.22-1499689510217:blk_1074111949_371169 java.net.SocketTimeoutException: 300 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.25.2:50010 remote=/192.168.25.2:48091] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251) at java.lang.Thread.run(Thread.java:748) 2017-08-22 15:01:11,630 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-840293587-192.168.25.22-1499689510217:blk_1074111949_371169, type=HAS_DOWNSTREAM_IN_PIPELINE: Thread is interrupted. 2017-08-22 15:01:11,630 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-840293587-192.168.25.22-1499689510217:blk_1074111949_371169, type=HAS_DOWNSTREAM_IN_PIPELINE terminating 2017-08-22 15:01:11,630 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-840293587-192.168.25.22-1499689510217:blk_1074111949_371169 received exception java.net.SocketTimeoutException: 300 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.25.2:50010 remote=/192.168.25.2:48091] 2017-08-22 15:01:11,631 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: dn2:50010:DataXceiver error processing WRITE_BLOCK operation src: /192.168.25.2:48091 dst: /192.168.25.2:50010 java.net.SocketTimeoutException: 300 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.25.2:50010 remote=/192.168.25.2:48091] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251) at java.lang.Thread.run(Thread.java:748) 2017-08-22 15:01:11,959 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-840293587-192.168.25.22-1499689510217:blk_1074111956_371170 src: /192.168.25.8:35958 dest: /192.168.25.2:50010 2017-08-22 15:01:11,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.25.8:35958, dest: /192.168.25.2:50010, bytes: 82, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_899288339_23, offset: 0, srvID: 075f3a14-9b13-404d-8ba8-4066212655d7, blockid: BP-840293587-192.168.25.22-1499689510217:blk_1074111956_371170, duration: 22005000 2017-08-22 15:01:11,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-840293587-192.168.25.22-1499689510217:blk_1074111956_371170, type=LAST_IN_PIPELINE, downstreams=0:[] terminating 2017-08-22 15:01:15,027 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-840293587-192.168.25.22-1499689510217:blk_1074111957_371172 src: /192.168.25.2:48097 dest: /192.168.25.2:50010 2017-08-22 15:01:15,037 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.25.2:48097, dest: /192.168.25.2:50010, bytes: 82, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1375829046_24, offset: 0, srvID: 075f3a14-9b13-404d-8ba8-4066212655d7, blockid: BP-840293587-192.168.25.22-1499689510217:blk_1074111957_371172, duration: 3663100 2017-08-22 15:01:15,038 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-840293587-192.168.25.22-1499689510217:blk_1074111957_371172, type=HAS_DOWNSTREAM_IN_PIPELINE terminating any help to determine what is the problem please?
... View more
Labels:
- Labels:
-
Apache Hadoop
08-15-2017
12:33 PM
iam using hadoop apache 2.7.1
and i have configured data node directory to have multiple directories
<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop/data_dir,file:///hdd/data_dir/</value>
<final>true</final>
</property>
according to this configuration writing file data should happens on both directories
/opt/hadoop/data_dir and file:///hdd/data_dir/ with same blocks names and on same sub directories names but in my cluster this behavior is not happening some times it writes data blocks to
local directory /opt/hadoop/data_dir and some times it writes data blocks to external hard directory file:///hdd/data_dir what could be possible reasons and how to control this behavior
... View more
Labels:
- Labels:
-
Apache Hadoop
04-09-2017
10:28 AM
iam using hadoop apache 2.7.1 on centos 7 and iam new to linux and i want to use httpfs so i should follow the following link https://hadoop.apache.org/docs/current/hadoop-hdfs-httpfs/ServerSetup.html but the problem that i can't find any download for httpfs-3.0.0-alpha2.tar.gz in
in order to tar it please any link or mirror to help and will this httpfs version works with my existing hadoop version
... View more
Labels:
- Labels:
-
Apache Hadoop
04-05-2017
11:17 AM
iam using hadoop apache 2.7.1 after setting high availability in hadoop cluster the automatic zookeeper fail over controller zkfc will apply fencing method to fence(stop) one of the two name nodes if it goes down and dfs.ha.fencing.methods in hdfs-site property handles this method as sshfence but my question is what about if we have a passworded ssh can fencing happens or automatic fail over works only with password less ssh ? is there any way to make sshfencce include password in ssh in configuration?
... View more
Labels:
- Labels:
-
Apache Hadoop
04-04-2017
05:47 AM
iam using hadoop apache 2.7.1 and i have followed the link you applied https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html and finally tried to force one of the two name nodes to be active manually by applying hdfs haadmin -transitionToActive hadoop-master with the following response 17/04/04 03:13:06 WARN ha.HAAdmin: Proceeding with manual HA state
management even though automatic failover is enabled for NameNode at
hadoop-slave-1/192.168.4.111:8020 17/04/04 03:13:07 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable 17/04/04 03:13:07 WARN ha.HAAdmin: Proceeding with manual HA state
management even though automatic failover is enabled for NameNode at
hadoop-master/192.168.4.128:8020 Operation failed: End of File Exception between local host is:
"hadoop-master/192.168.4.128"; destination host is:
"hadoop-master":8020; : java.io.EOFException; For more details
see:http://wiki.apache.org/hadoop/EOFException what should i do with two stand by name nodes should i apply name node format on one of these two name nodes
... View more
04-03-2017
02:04 PM
running ./zkCli on both namenodes shows same error Welcome to ZooKeeper!
JLine support is enabled
[zk: localhost:2181(CONNECTING) 0] 2017-04-03 09:57:34,141 [myid:] - INFO [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2017-04-03 09:57:34,148 [myid:] - WARN [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1162] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
... View more
04-03-2017
02:03 PM
running ./zkCli on two name nodes shows the same error Welcome to ZooKeeper!
JLine support is enabled
[zk: localhost:2181(CONNECTING) 0] 2017-04-03 09:57:34,141 [myid:] - INFO [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2017-04-03 09:57:34,148 [myid:] - WARN [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1162] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
... View more
04-03-2017
01:29 PM
even though i started zookeper server and i get a leader mode in one of two namenodes and follower mode in the other name node and data node, i still get same problem that both of two name nodes are stand by ,also there are no log files under log directory that is configured in zoo.cfg ,so i can't know zoo keeper errors but i think when .zkServer.sh status gives a status(followe or leader) it indicates that every thing with zookeeper is all right isn't it ?
... View more
04-03-2017
01:01 PM
i have configured high availability in my cluster
which consists of three nodes
hadoop-master(192.168.4.128)(name node)
hadoop-slave-1(192.168.4.111) (another name node )
hadoop-slave-2 (192.168.4.106) (data node)
without formatting name node ( converting a non-HA-enabled cluster to be HA-enabled) as described here
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
but i got two name nodes working as standby
so i tried to move the transition of one of these two nodes to active by applying the following command
hdfs haadmin -transitionToActive mycluster --forcemanual
with the following out put 17/04/03 08:07:35 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at hadoop-master/192.168.4.128:8020
17/04/03 08:07:36 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at hadoop-slave-1/192.168.4.111:8020
Illegal argument: Unable to determine service address for namenode 'mycluster'
my core-site is <property>
<name>dfs.tmp.dir</name>
<value>/opt/hadoop/data15</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-master:8020</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/usr/local/journal/node/local/data</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp</value>
</property>
my hdfs-site.xml is <property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/opt/hadoop/data16</value>
<final>true</final>
</property>
<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop/data17</value>
<final>true</final>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop-slave-1:50090</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
<final>true</final>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>hadoop-master,hadoop-slave-1</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.hadoop-master</name>
<value>hadoop-master:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.hadoop-slave-1</name>
<value>hadoop-slave-1:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.hadoop-master</name>
<value>hadoop-master:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.hadoop-slave-1</name>
<value>hadoop-slave-1:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop-master:8485;hadoop-slave-2:8485;hadoop-slave-1:8485/mycluster</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop-master:2181,hadoop-slave-1:2181,hadoop-slave-2:2181</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>3000</value>
</property>
what should the service address value be ? and what are possible solutions i can apply in order
to turn on one name node of the two nodes to active state ? note the zookeeper server on all three nodes is stopped
... View more
Labels:
- Labels:
-
Apache Hadoop