About jade_liu

jade_liu · ‎02-25-2016

problem fixed. Turns out we have a sqoop job which keeps writing to the cluster and once we killed it, it was fixed. Thanks @Neeraj Sabharwal ♦!

jade_liu · ‎02-25-2016

Thanks @Neeraj Sabharwal ♦ dfs.datanode.max.transfer.threads = 1024 dfs.datanode.handler.count =100 Did not set the property dfs.client.file-block-storage-locations.num-threads. dfs.blocksize = 134217728 Block replication = 3 Reserved space for HDFS = 1GB io.file.buffer.size = 131072 Thanks!

jade_liu · ‎02-24-2016

Thanks @Neeraj Sabharwal! I've checked all the nodes in RM web UI and all are healthy. I tried to restart the whole cluster but the same problem happened again. Did not see anything in the Resource Manager logs. Should I change any configuration as shown in this thread?

jade_liu · ‎02-23-2016

We are using HDP2.0. Recently we cannot write any new table to it. All components look healthy from the ambari webui. In the masternode hdfs logs we found the following error messages: 2016-02-23 17:25:09,985 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(698)) - Exception for BP-1706820793-10.86.36.8-1381941559687:blk_1080366074_6646021 java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.86.36.8:50010 remote=/10.80.27.210:54210] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:429) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:668) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:102) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:662) 2016-02-23 17:25:09,985 ERROR datanode.DataNode (DataXceiver.java:run(225)) - dn01.nor1solutions.com:50010:DataXceiver error processing WRITE_BLOCK operation src: /10.80.27.210:54210 dest: /10.86.36.8:50010 java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.86.36.8:50010 remote=/10.80.27.210:54210] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:429) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:668) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:102) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:662) Can anyone help fixing it? Thanks!

jade_liu · ‎02-16-2016

Hi, We have a cluster of 5 nodes currently running HDP2.0. Recently we observed that YARN is using 2000% of the memory. Currently we allocated 2GB for yarn memory and the metrics showed 40GB was used for our current job. All nodes are still "alive". Will that be a problem? Should we increase the allocated memory for yarn cluster?

jade_liu · ‎01-28-2016

Thank you @Artem Ervits! If I cannot turn off fw, is there any other option to fix the connection problem?

jade_liu · ‎01-28-2016

Thanks @Artem Ervits for your help! run traceroute -p 10000 spark01.nor1solutions.com get the following result: traceroute to spark01.nor1solutions.com (10.86.36.14), 64 hops max, 52 byte packets the routes look normal. Then tried to disable the firewall using service iptables stop on the server and seems like Hive now is successfully started. But now when I try to run any hive question, I still getting the following error: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.5.2.v20140319-9ad6abd): org.eclipse.persistence.exceptions.DatabaseException Internal Exception: java.sql.SQLException: Connections could not be acquired from the underlying database! Error Code: 0 H020 Could not establish connecton to spark01.nor1solutions.com:10000: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused Is there any other thing blocking the connection to port 10000?

jade_liu · ‎01-28-2016

Thanks for replying! Here is the content inside /var/lib/ambari-agent/data/errors-64.txt: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/service_check.py", line 106, in <module> HiveServiceCheck().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/service_check.py", line 97, in service_check (params.hostname, params.hive_server_port, elapsed_time)) resource_management.core.exceptions.Fail: Connection to Hive server spark01.nor1solutions.com on port 10000 failed after 295 seconds

jade_liu · ‎01-28-2016

Thanks for your reply! netstat -tupln | grep -i 10000 did not get any result. To check firewall settings: run iptables -L, get the following result: Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere anywhere DROP icmp -- anywhere anywhere icmp timestamp-request DROP icmp -- anywhere anywhere icmp timestamp-reply DROP icmp -- anywhere anywhere icmp address-mask-request ACCEPT icmp -- anywhere anywhere icmp any ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED ACCEPT all -- anywhere anywhere ACCEPT tcp -- 10.40.19.4 anywhere tcp dpt:5666 ACCEPT tcp -- dbbkup02.nor1solutions.com anywhere tcp dpt:5666 ACCEPT tcp -- 32.c2.9bc0.ip4.static.sl-reverse.com anywhere tcp dpt:5666 ACCEPT tcp -- nagios-dev.nor1sc.net anywhere tcp dpt:5666 ACCEPT udp -- 10.40.19.4 anywhere udp dpt:snmp ACCEPT udp -- dbbkup02.nor1solutions.com anywhere udp dpt:snmp ACCEPT udp -- 32.c2.9bc0.ip4.static.sl-reverse.com anywhere udp dpt:snmp ACCEPT tcp -- 10.0.0.0/8 anywhere state NEW tcp dpt:ssh ACCEPT tcp -- 209.119.28.98 anywhere state NEW tcp dpt:ssh DROP all -- anywhere anywhere I'm using centos6.7.

jade_liu · ‎01-28-2016

Hi, I'm trying to install Hive on a remote serve using Ambari installation tool. I'm using Ambari 2.2 and installed HDP 2.3.0 with the following components: kafka hive hadoop ambari metrics trez zookeeper hbase I got the following error with check Hive: stderr: /var/lib/ambari-agent/data/errors-64.txt Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/service_check.py", line 106, in <module> HiveServiceCheck().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/service_check.py", line 97, in service_check (params.hostname, params.hive_server_port, elapsed_time)) resource_management.core.exceptions.Fail: Connection to Hive server spark01.nor1solutions.com on port 10000 failed after 295 seconds stderr: /var/lib/ambari-agent/data/errors-64.txt All ports on the remote server are open. When I use netstat -tupln | grep -i listen | grep -i 10000 There was no process listening on that port. I've retried the installation and the same error happened again. Can anyone help on how to fix it? Thanks,

Online	Offline
Last Visited	‎02-25-2016 06:17 PM

Member Since	‎01-11-2016 06:25 PM
Last Visited	‎02-25-2016 06:17 PM
Posts	11
Kudos received	5

Cloudera Community

Re: WRITE_BLOCK Error in HDFS logs

Re: WRITE_BLOCK Error in HDFS logs

Re: WRITE_BLOCK Error in HDFS logs

WRITE_BLOCK Error in HDFS logs

YARN used 2000% of memory

Re: check hive failed when using Ambari installati...

Re: check hive failed when using Ambari installati...

Re: check hive failed when using Ambari installati...

Re: check hive failed when using Ambari installati...

check hive failed when using Ambari installation t...