Reply
Explorer
Posts: 20
Registered: ‎12-02-2013

error processing READ_BLOCK operation

[ Edited ]

I see this error intermittently during periods of high writes to one of our collections. Eventually, the error cascades and the collection is unavailable and it self corrects after 30-40 minutes. Any ideas why this would happen? On the datanode side, I'm seeing error processing READ_BLOCK operation:

 

DatanodeRegistration(10.130.71.220, storageID=DS-1520167466-10.130.71.220-50010-1389140294030, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=cluster25;nsid=1124238169;c=0):Got exception while serving BP-1661432518-10.130.71.219-1389140286721:blk_5701049520037157281_7691487 to /10.130.71.220:45293
org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-1661432518-10.130.71.219-1389140286721:blk_5701049520037157281_7691487
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:382)
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:193)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
	at java.lang.Thread.run(Thread.java:662)
node1:50010:DataXceiver error processing READ_BLOCK operation  src: /10.130.71.220:45293 dest: /10.130.71.220:50010
org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-1661432518-10.130.71.219-1389140286721:blk_5701049520037157281_7691487
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:382)
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:193)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
	at java.lang.Thread.run(Thread.java:662)

This is the error on Solr Side

Failed to connect to /10.130.71.220:50010 for file /solr/imageSearch_z3_l1/core_node2/data/index/_aosr_Lucene41_0.tim for block BP-1661432518-10.130.71.219-1389140286721:blk_5701049520037157281_7691487:java.io.IOException: Got error for OP_READ_BLOCK, self=/10.130.71.220:45822, remote=/10.130.71.220:50010, for file /solr/imageSearch_z3_l1/core_node2/data/index/_aosr_Lucene41_0.tim, for pool BP-1661432518-10.130.71.219-1389140286721 block 5701049520037157281_7691487
Failed to connect to /10.130.71.219:50010 for file /solr/imageSearch_z3_l1/core_node2/data/index/_aosr_Lucene41_0.tim for block BP-1661432518-10.130.71.219-1389140286721:blk_5701049520037157281_7691487:java.io.IOException: Got error for OP_READ_BLOCK, self=/10.130.71.220:53392, remote=/10.130.71.219:50010, for file /solr/imageSearch_z3_l1/core_node2/data/index/_aosr_Lucene41_0.tim, for pool BP-1661432518-10.130.71.219-1389140286721 block 5701049520037157281_7691487
Failed to connect to /10.130.71.222:50010 for file /solr/imageSearch_z3_l1/core_node2/data/index/_aosr_Lucene41_0.tim for block BP-1661432518-10.130.71.219-1389140286721:blk_5701049520037157281_7691487:java.io.IOException: Got error for OP_READ_BLOCK, self=/10.130.71.220:50167, remote=/10.130.71.222:50010, for file /solr/imageSearch_z3_l1/core_node2/data/index/_aosr_Lucene41_0.tim, for pool BP-1661432518-10.130.71.219-1389140286721 block 5701049520037157281_7691487
Could not obtain BP-1661432518-10.130.71.219-1389140286721:blk_5701049520037157281_7691487 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...
DFS chooseDataNode: got # 1 IOException, will wait for 2930.407707541845 msec.
java.io.FileNotFoundException: File does not exist: /solr/imageSearch_z3_l1/core_node2/data/index/_aosr_Lucene41_0.tim
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:39)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1439)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1391)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1367)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1340)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:413)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:172)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44938)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

	at sun.reflect.GeneratedConstructorAccessor2536.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:972)
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:960)
	at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:238)
	at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:205)
	at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:866)
	at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:889)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1148)
	at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:71)
	at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:90)
	at org.apache.solr.store.hdfs.HdfsDirectory$HdfsIndexInput.readInternal(HdfsDirectory.java:189)
	at org.apache.solr.store.blockcache.CustomBufferedIndexInput.readBytes(CustomBufferedIndexInput.java:106)
	at org.apache.solr.store.blockcache.CustomBufferedIndexInput.readBytes(CustomBufferedIndexInput.java:61)
	at org.apache.solr.store.blockcache.BlockDirectory$CachedIndexInput.readIntoCacheAndResult(BlockDirectory.java:206)
	at org.apache.solr.store.blockcache.BlockDirectory$CachedIndexInput.fetchBlock(BlockDirectory.java:193)
	at org.apache.solr.store.blockcache.BlockDirectory$CachedIndexInput.readInternal(BlockDirectory.java:177)
	at org.apache.solr.store.blockcache.CustomBufferedIndexInput.refill(CustomBufferedIndexInput.java:185)
	at org.apache.solr.store.blockcache.CustomBufferedIndexInput.readByte(CustomBufferedIndexInput.java:40)
	at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
	at org.apache.solr.store.blockcache.CustomBufferedIndexInput.readVInt(CustomBufferedIndexInput.java:154)
	at org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock(BlockTreeTermsReader.java:2357)
	at org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BlockTreeTermsReader.java:1693)
	at org.apache.lucene.index.TermContext.build(TermContext.java:95)
	at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:166)
	at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:687)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
	at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1494)
	at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1363)
	at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:474)
	at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:434)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1909)
	at org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:253)
	at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:210)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1909)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:739)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:169)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.solr.servlet.ProxyUserFilter.doFilter(ProxyUserFilter.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.solr.servlet.SolrHadoopAuthenticationFilter$2.doFilter(SolrHadoopAuthenticationFilter.java:140)
	at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:384)
	at org.apache.solr.servlet.SolrHadoopAuthenticationFilter.doFilter(SolrHadoopAuthenticationFilter.java:145)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.solr.servlet.HostnameFilter.doFilter(HostnameFilter.java:86)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
	at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
	at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /solr/imageSearch_z3_l1/core_node2/data/index/_aosr_Lucene41_0.tim
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:39)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1439)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1391)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1367)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1340)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:413)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:172)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44938)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

	at org.apache.hadoop.ipc.Client.call(Client.java:1225)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
	at $Proxy9.getBlockLocations(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
	at $Proxy9.getBlockLocations(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:155)
	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:970)
	... 61 more

 

 

Cloudera Employee MRM
Cloudera Employee
Posts: 15
Registered: ‎08-17-2013

Re: error processing READ_BLOCK operation

This looks like Solr https://issues.apache.org/jira/browse/SOLR-5693 given the Solr exception.

 

We have this backported and it will be included in the next release.

 

This is the first I have seen it reported that it caused that kind of disruption though. Usually it would just cause brief problems intermittently on NRT searches.

Explorer
Posts: 20
Registered: ‎12-02-2013

Re: error processing READ_BLOCK operation

Thanks for the quick response. We can see this more regularly when the writes are heavy. What could be a workaround for this error.  Should I increase the autocommit to a longer interval - currently it is set to 60000 ms. Any other workaround will really help? Thanks

Explorer
Posts: 20
Registered: ‎12-02-2013

Re: error processing READ_BLOCK operation

would this fix if I disable NRTcaching by making solr.hdfs.nrtcachingdirectory.enable = false ?

Cloudera Employee MRM
Cloudera Employee
Posts: 15
Registered: ‎08-17-2013

Re: error processing READ_BLOCK operation

I think commiting more often is actually probably your best bet. I think we actually default to 15 seconds with upstream Solr - to minimize default RAM usage of the transaction log and based on some testing another committer did on performance impact.

 

There is no great work-around beyond not using NRT, but more frequent hard commits could make it less likely to occur.

Cloudera Employee MRM
Cloudera Employee
Posts: 15
Registered: ‎08-17-2013

Re: error processing READ_BLOCK operation

No, the NRTCachingDirectory is not really related to the issue.

Explorer
Posts: 20
Registered: ‎12-02-2013

Re: error processing READ_BLOCK operation

Thanks - so how do I make it not use NRT? Is there any configuration in solrconfig or somewhere where we can disable NRT

Highlighted
New Contributor
Posts: 1
Registered: ‎09-12-2017

Re: error processing READ_BLOCK operation

hi Guys

 

did you find the solution on this error.

Sep 12, 11:26:10.497 AMERRORorg.apache.hadoop.hdfs.server.datanode.DataNode
worker1.scd.sasol.com:1004:DataXceiver error processing READ_BLOCK operation  src: /x.x.x.x:44258 dst: /x.x.x.x:1004
org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-810251609-10.138.255.40-1492692599116:blk_1074274572_550782
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:456)
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:236)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537)
	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148)
	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
	at java.lang.Thread.run(Thread.java:745)
Cloudera Employee
Posts: 198
Registered: ‎01-09-2014

Re: error processing READ_BLOCK operation

That is likely some data corruption issue in hdfs. Run an 'hdfs fsck /solr' to confirm.

-pd
Announcements