Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

error processing READ_BLOCK operation

error processing READ_BLOCK operation

Explorer

I see this error intermittently during periods of high writes to one of our collections. Eventually, the error cascades and the collection is unavailable and it self corrects after 30-40 minutes. Any ideas why this would happen? On the datanode side, I'm seeing error processing READ_BLOCK operation:

 

DatanodeRegistration(10.130.71.220, storageID=DS-1520167466-10.130.71.220-50010-1389140294030, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=cluster25;nsid=1124238169;c=0):Got exception while serving BP-1661432518-10.130.71.219-1389140286721:blk_5701049520037157281_7691487 to /10.130.71.220:45293
org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-1661432518-10.130.71.219-1389140286721:blk_5701049520037157281_7691487
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:382)
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:193)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
	at java.lang.Thread.run(Thread.java:662)
node1:50010:DataXceiver error processing READ_BLOCK operation  src: /10.130.71.220:45293 dest: /10.130.71.220:50010
org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-1661432518-10.130.71.219-1389140286721:blk_5701049520037157281_7691487
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:382)
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:193)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
	at java.lang.Thread.run(Thread.java:662)

This is the error on Solr Side

Failed to connect to /10.130.71.220:50010 for file /solr/imageSearch_z3_l1/core_node2/data/index/_aosr_Lucene41_0.tim for block BP-1661432518-10.130.71.219-1389140286721:blk_5701049520037157281_7691487:java.io.IOException: Got error for OP_READ_BLOCK, self=/10.130.71.220:45822, remote=/10.130.71.220:50010, for file /solr/imageSearch_z3_l1/core_node2/data/index/_aosr_Lucene41_0.tim, for pool BP-1661432518-10.130.71.219-1389140286721 block 5701049520037157281_7691487
Failed to connect to /10.130.71.219:50010 for file /solr/imageSearch_z3_l1/core_node2/data/index/_aosr_Lucene41_0.tim for block BP-1661432518-10.130.71.219-1389140286721:blk_5701049520037157281_7691487:java.io.IOException: Got error for OP_READ_BLOCK, self=/10.130.71.220:53392, remote=/10.130.71.219:50010, for file /solr/imageSearch_z3_l1/core_node2/data/index/_aosr_Lucene41_0.tim, for pool BP-1661432518-10.130.71.219-1389140286721 block 5701049520037157281_7691487
Failed to connect to /10.130.71.222:50010 for file /solr/imageSearch_z3_l1/core_node2/data/index/_aosr_Lucene41_0.tim for block BP-1661432518-10.130.71.219-1389140286721:blk_5701049520037157281_7691487:java.io.IOException: Got error for OP_READ_BLOCK, self=/10.130.71.220:50167, remote=/10.130.71.222:50010, for file /solr/imageSearch_z3_l1/core_node2/data/index/_aosr_Lucene41_0.tim, for pool BP-1661432518-10.130.71.219-1389140286721 block 5701049520037157281_7691487
Could not obtain BP-1661432518-10.130.71.219-1389140286721:blk_5701049520037157281_7691487 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...
DFS chooseDataNode: got # 1 IOException, will wait for 2930.407707541845 msec.
java.io.FileNotFoundException: File does not exist: /solr/imageSearch_z3_l1/core_node2/data/index/_aosr_Lucene41_0.tim
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:39)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1439)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1391)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1367)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1340)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:413)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:172)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44938)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

	at sun.reflect.GeneratedConstructorAccessor2536.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:972)
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:960)
	at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:238)
	at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:205)
	at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:866)
	at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:889)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1148)
	at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:71)
	at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:90)
	at org.apache.solr.store.hdfs.HdfsDirectory$HdfsIndexInput.readInternal(HdfsDirectory.java:189)
	at org.apache.solr.store.blockcache.CustomBufferedIndexInput.readBytes(CustomBufferedIndexInput.java:106)
	at org.apache.solr.store.blockcache.CustomBufferedIndexInput.readBytes(CustomBufferedIndexInput.java:61)
	at org.apache.solr.store.blockcache.BlockDirectory$CachedIndexInput.readIntoCacheAndResult(BlockDirectory.java:206)
	at org.apache.solr.store.blockcache.BlockDirectory$CachedIndexInput.fetchBlock(BlockDirectory.java:193)
	at org.apache.solr.store.blockcache.BlockDirectory$CachedIndexInput.readInternal(BlockDirectory.java:177)
	at org.apache.solr.store.blockcache.CustomBufferedIndexInput.refill(CustomBufferedIndexInput.java:185)
	at org.apache.solr.store.blockcache.CustomBufferedIndexInput.readByte(CustomBufferedIndexInput.java:40)
	at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
	at org.apache.solr.store.blockcache.CustomBufferedIndexInput.readVInt(CustomBufferedIndexInput.java:154)
	at org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock(BlockTreeTermsReader.java:2357)
	at org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BlockTreeTermsReader.java:1693)
	at org.apache.lucene.index.TermContext.build(TermContext.java:95)
	at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:166)
	at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:687)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
	at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1494)
	at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1363)
	at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:474)
	at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:434)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1909)
	at org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:253)
	at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:210)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1909)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:739)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:169)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.solr.servlet.ProxyUserFilter.doFilter(ProxyUserFilter.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.solr.servlet.SolrHadoopAuthenticationFilter$2.doFilter(SolrHadoopAuthenticationFilter.java:140)
	at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:384)
	at org.apache.solr.servlet.SolrHadoopAuthenticationFilter.doFilter(SolrHadoopAuthenticationFilter.java:145)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.solr.servlet.HostnameFilter.doFilter(HostnameFilter.java:86)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
	at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
	at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /solr/imageSearch_z3_l1/core_node2/data/index/_aosr_Lucene41_0.tim
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:39)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1439)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1391)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1367)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1340)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:413)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:172)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44938)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

	at org.apache.hadoop.ipc.Client.call(Client.java:1225)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
	at $Proxy9.getBlockLocations(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
	at $Proxy9.getBlockLocations(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:155)
	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:970)
	... 61 more

 

 

8 REPLIES 8

Re: error processing READ_BLOCK operation

Cloudera Employee

This looks like Solr https://issues.apache.org/jira/browse/SOLR-5693 given the Solr exception.

 

We have this backported and it will be included in the next release.

 

This is the first I have seen it reported that it caused that kind of disruption though. Usually it would just cause brief problems intermittently on NRT searches.

Re: error processing READ_BLOCK operation

Explorer

Thanks for the quick response. We can see this more regularly when the writes are heavy. What could be a workaround for this error.  Should I increase the autocommit to a longer interval - currently it is set to 60000 ms. Any other workaround will really help? Thanks

Re: error processing READ_BLOCK operation

Explorer

would this fix if I disable NRTcaching by making solr.hdfs.nrtcachingdirectory.enable = false ?

Re: error processing READ_BLOCK operation

Cloudera Employee

I think commiting more often is actually probably your best bet. I think we actually default to 15 seconds with upstream Solr - to minimize default RAM usage of the transaction log and based on some testing another committer did on performance impact.

 

There is no great work-around beyond not using NRT, but more frequent hard commits could make it less likely to occur.

Re: error processing READ_BLOCK operation

Cloudera Employee

No, the NRTCachingDirectory is not really related to the issue.

Re: error processing READ_BLOCK operation

Explorer

Thanks - so how do I make it not use NRT? Is there any configuration in solrconfig or somewhere where we can disable NRT

Re: error processing READ_BLOCK operation

Explorer

hi Guys

 

did you find the solution on this error.

Sep 12, 11:26:10.497 AMERRORorg.apache.hadoop.hdfs.server.datanode.DataNode
worker1.scd.sasol.com:1004:DataXceiver error processing READ_BLOCK operation  src: /x.x.x.x:44258 dst: /x.x.x.x:1004
org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-810251609-10.138.255.40-1492692599116:blk_1074274572_550782
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:456)
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:236)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537)
	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148)
	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
	at java.lang.Thread.run(Thread.java:745)

Re: error processing READ_BLOCK operation

Super Collaborator
That is likely some data corruption issue in hdfs. Run an 'hdfs fsck /solr' to confirm.

-pd