Created on 08-18-2014 10:59 AM - edited 09-16-2022 02:05 AM
[impl.ThriftTransportPool] WARN: Thread "shell" stuck on io to x.x.x.x:9999:9999 (0) for at least 120040 ms
Thanks in advance.
Created 09-21-2014 11:24 AM
Hi!
The problem you are getting is a known limitation of Accumulo on small clusters. By default Accumulo attempts to use a replication factor of 5 for the metadata table, ignoring the "table.file.replication" setting. Normally, Cloudera Manager does not set a max replication factor. This causes under-replication warnings until you can correct either the number of nodes or manually adjust the replication setting on that table.
In your cluster, it appears the "dfs.replication.max" setting has been adjusted to match your number of cluster nodes. This is causing Accumulo's attempts to create new files for its internal tables to fail.
Unfortunately, I'm not sure this can be fixed without data loss. However, to recover you should first edit the "dfs.replication.max" setting for HDFS to be >= 5. Then you should adjust the replication on the metadata and root tables to be <= your number of DataNodes. After that it should be safe to lower dfs.replication.max again.
Adjust the replication in the accumulo shell:
$> config -t accumulo.metadata -s table.file.replication=3 $> config -t accumulo.root -s table.file.replication=3
Created 08-18-2014 03:16 PM
Hi NSU,
If you navigate to the Monitor web page, do you see any messages under "recent logs"?
Alternatively, are there any ERROR or WARN messages in the tablet server logs?
Mike
Created 08-18-2014 08:15 PM
Thank you for reply. Here I am seeing errors / warnings in logs file.
18 22:56:44,0217 | tserver:DN2 | 48 | WARN | System swappiness setting is greater than ten (60) which can cause time-sensitive operations to be delayed. Accumulo is time sensitive because it needs to maintain distributed lock agreement. |
18 22:56:46,0982 | gc:DN1 | 48 | WARN | System swappiness setting is greater than ten (60) which can cause time-sensitive operations to be delayed. Accumulo is time sensitive because it needs to maintain distributed lock agreement. |
18 22:56:49,0061 | master:DN1 | 48 | WARN | System swappiness setting is greater than ten (60) which can cause time-sensitive operations to be delayed. Accumulo is time sensitive because it needs to maintain distributed lock agreement. |
18 22:56:51,0503 | tserver:master | 48 | WARN | System swappiness setting is greater than ten (60) which can cause time-sensitive operations to be delayed. Accumulo is time sensitive because it needs to maintain distributed lock agreement. |
18 23:03:38,0935 | tserver:DN2 | 103 | ERROR | org.apache.hadoop.ipc.RemoteException(java.io.IOException): file /accumulo/tables/+r/root_tablet/F00000rq.rf_tmp on client xx.xx.xx.xx. Requested replication 5 exceeds maximum 3 at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.verifyReplication(BlockManager.java:942) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2216) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2188) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:505) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) org.apache.hadoop.ipc.RemoteException(java.io.IOException): file /accumulo/tables/+r/root_tablet/F00000rq.rf_tmp on client xx.xx.xx.xx. Requested replication 5 exceeds maximum 3 at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.verifyReplication(BlockManager.java:942) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2216) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2188) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:505) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) at org.apache.hadoop.ipc.Client.call(Client.java:1409) at org.apache.hadoop.ipc.Client.call(Client.java:1362) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy14.create(Unknown Source) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy14.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:258) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1599) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1461) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1386) at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:394) at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:390) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:334) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) at org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:126) at org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106) at org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(FileOperations.java:80) at org.apache.accumulo.tserver.Compactor.call(Compactor.java:340) at org.apache.accumulo.tserver.MinorCompactor.call(MinorCompactor.java:96) at org.apache.accumulo.tserver.Tablet.minorCompact(Tablet.java:2045) at org.apache.accumulo.tserver.Tablet.access$4300(Tablet.java:170) at org.apache.accumulo.tserver.Tablet$MinorCompactionTask.run(Tablet.java:2132) at org.apache.accumulo.tserver.Tablet.minorCompactNow(Tablet.java:2238) at org.apache.accumulo.tserver.TabletServer$AssignmentHandler.run(TabletServer.java:2922) at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler$3.run(TabletServer.java:2277) |
18 23:03:38,0937 | tserver:DN2 | 103 | WARN | MinC failed (file /accumulo/tables/+r/root_tablet/F00000rq.rf_tmp on client xx.xx.xx.xx. Requested replication 5 exceeds maximum 3 at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.verifyReplication(BlockManager.java:942) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2216) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2188) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:505) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) ) to create hdfs://DN1:8020/accumulo/tables/+r/root_tablet/F00000rq.rf_tmp retrying ... |
Created 08-19-2014 09:03 AM
Thanks for sharing those logs, that is very helpful.
Just to confirm, which versions of Accumulo and CDH are yo using? Also, if you are using Cloudera Manager, which version of that?
Created 08-19-2014 09:32 AM
I am using following versions
for Accumulo -1.6
CDH -5
Cloudera Manager - 5.1
Thank you for response.
Created on 08-20-2014 07:36 AM - edited 08-20-2014 08:02 AM
Hi,
following are some logs,I am seeing in per-Table problem report.
accumulo.root | FILE_WRITE | xx.xx.xx | 2014/08/18 13:21:33 EDT | hdfs://DN1:8020/accumulo/tables/+r/root_tablet/F00000e9.rf_tmp | file /accumulo/tables/+r/root_tablet/F00000e9.rf_tmp on client xx.xx.xx. Requested replication 5 exceeds maximum 2 at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.verifyReplication(BlockManager.java:942) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2216) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2188) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:505) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) |
Thank you for response.
Please help me.
Created 08-20-2014 01:25 PM
Hi NSU,
A couple more questions about your configuration. I think I've identified the issue and am working on a solution for you.
I would like to make sure that you used Cloudera Manager to initialize Accumulo. Please confirm that this is the case.
Are you using a single node cluster? If not, how many hosts do you have? An order of magnitude is fine if you do not have the exact number available.
Thanks,
Mike
Created 08-20-2014 03:12 PM
Created 08-20-2014 03:19 PM
Can you give me some description of the layout of the roles on your three nodes?
I expect that it is NameNode on one, Accumulo Master on the second, and DataNode + Tablet Server on the third. Is this correct?
Created 08-20-2014 03:48 PM