Member since
01-11-2016
11
Posts
5
Kudos Received
0
Solutions
02-25-2016
06:17 PM
problem fixed. Turns out we have a sqoop job which keeps writing to the cluster and once we killed it, it was fixed. Thanks @Neeraj Sabharwal ♦!
... View more
02-25-2016
05:27 PM
1 Kudo
Thanks @Neeraj Sabharwal ♦ dfs.datanode.max.transfer.threads = 1024 dfs.datanode.handler.count =100 Did not set the property dfs.client.file-block-storage-locations.num-threads. dfs.blocksize = 134217728 Block replication = 3 Reserved space for HDFS = 1GB io.file.buffer.size = 131072 Thanks!
... View more
02-24-2016
08:21 PM
1 Kudo
Thanks @Neeraj Sabharwal! I've checked all the nodes in RM web UI and all are healthy. I tried to restart the whole cluster but the same problem happened again. Did not see anything in the Resource Manager logs. Should I change any configuration as shown in this thread?
... View more
02-23-2016
08:39 PM
2 Kudos
We are using HDP2.0. Recently we cannot write any new table to it. All components look healthy from the ambari webui. In the masternode hdfs logs we found the following error messages: 2016-02-23 17:25:09,985 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(698)) - Exception for BP-1706820793-10.86.36.8-1381941559687:blk_1080366074_6646021
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.86.36.8:50010 remote=/10.80.27.210:54210]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.DataInputStream.read(DataInputStream.java:132)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:429)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:668)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:102)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:662)
2016-02-23 17:25:09,985 ERROR datanode.DataNode (DataXceiver.java:run(225)) - dn01.nor1solutions.com:50010:DataXceiver error processing WRITE_BLOCK operation src: /10.80.27.210:54210 dest: /10.86.36.8:50010
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.86.36.8:50010 remote=/10.80.27.210:54210]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.DataInputStream.read(DataInputStream.java:132)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:429)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:668)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:102)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:662)
Can anyone help fixing it?
Thanks!
... View more
Labels:
- Labels:
-
Apache Hadoop
02-16-2016
11:24 PM
1 Kudo
Hi, We have a cluster of 5 nodes currently running HDP2.0. Recently we observed that YARN is using 2000% of the memory. Currently we allocated 2GB for yarn memory and the metrics showed 40GB was used for our current job. All nodes are still "alive". Will that be a problem? Should we increase the allocated memory for yarn cluster?
... View more
Labels:
- Labels:
-
Apache YARN
01-28-2016
11:58 PM
Thank you @Artem Ervits! If I cannot turn off fw, is there any other option to fix the connection problem?
... View more
01-28-2016
10:08 PM
Thanks @Artem Ervits for your help! run traceroute -p 10000 spark01.nor1solutions.com get the following result: traceroute to spark01.nor1solutions.com (10.86.36.14), 64 hops max, 52 byte packets the routes look normal. Then tried to disable the firewall using service iptables stop on the server and seems like Hive now is successfully started. But now when I try to run any hive question, I still getting the following error: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.5.2.v20140319-9ad6abd): org.eclipse.persistence.exceptions.DatabaseException Internal Exception: java.sql.SQLException: Connections could not be acquired from the underlying database! Error Code: 0 H020 Could not establish connecton to spark01.nor1solutions.com:10000: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused Is there any other thing blocking the connection to port 10000?
... View more
01-28-2016
08:42 PM
Thanks for replying! Here is the content inside /var/lib/ambari-agent/data/errors-64.txt: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/service_check.py", line 106, in <module> HiveServiceCheck().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/service_check.py", line 97, in service_check (params.hostname, params.hive_server_port, elapsed_time)) resource_management.core.exceptions.Fail: Connection to Hive server spark01.nor1solutions.com on port 10000 failed after 295 seconds
... View more
01-28-2016
07:56 PM
Thanks for your reply! netstat -tupln | grep -i 10000 did not get any result. To check firewall settings: run iptables -L, get the following result: Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere anywhere DROP icmp -- anywhere anywhere icmp timestamp-request DROP icmp -- anywhere anywhere icmp timestamp-reply DROP icmp -- anywhere anywhere icmp address-mask-request ACCEPT icmp -- anywhere anywhere icmp any ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED ACCEPT all -- anywhere anywhere ACCEPT tcp -- 10.40.19.4 anywhere tcp dpt:5666 ACCEPT tcp -- dbbkup02.nor1solutions.com anywhere tcp dpt:5666 ACCEPT tcp -- 32.c2.9bc0.ip4.static.sl-reverse.com anywhere tcp dpt:5666 ACCEPT tcp -- nagios-dev.nor1sc.net anywhere tcp dpt:5666 ACCEPT udp -- 10.40.19.4 anywhere udp dpt:snmp ACCEPT udp -- dbbkup02.nor1solutions.com anywhere udp dpt:snmp ACCEPT udp -- 32.c2.9bc0.ip4.static.sl-reverse.com anywhere udp dpt:snmp ACCEPT tcp -- 10.0.0.0/8 anywhere state NEW tcp dpt:ssh ACCEPT tcp -- 209.119.28.98 anywhere state NEW tcp dpt:ssh DROP all -- anywhere anywhere I'm using centos6.7.
... View more
01-28-2016
06:53 PM
Hi, I'm trying to install Hive on a remote serve using Ambari installation tool. I'm using Ambari 2.2 and installed HDP 2.3.0 with the following components: kafka hive hadoop ambari metrics trez zookeeper hbase I got the following error with check Hive: stderr: /var/lib/ambari-agent/data/errors-64.txt Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/service_check.py", line 106, in <module>
HiveServiceCheck().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/service_check.py", line 97, in service_check
(params.hostname, params.hive_server_port, elapsed_time))
resource_management.core.exceptions.Fail: Connection to Hive server spark01.nor1solutions.com on port 10000 failed after 295 seconds stderr: /var/lib/ambari-agent/data/errors-64.txt
All ports on the remote server are open. When I use
netstat -tupln | grep -i listen | grep -i 10000
There was no process listening on that port. I've retried the installation and the same error happened again.
Can anyone help on how to fix it?
Thanks,
... View more
Labels: