Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

BlockReaderFactory: I/O error constructing remote block reader in pyspark job

BlockReaderFactory: I/O error constructing remote block reader in pyspark job

Hi All, 

 

while executing a pyspark job, I am getting expected output but with lots of warnings in console as shown below. 

 

16/04/21 06:57:49 WARN BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3492)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:838)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:753)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:374)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:624)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:851)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:903)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66)
at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:419)
at parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:238)
at parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:234)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/04/21 06:57:49 WARN DFSClient: Failed to connect to Node01:1004 for block, add to deadNodes and continue. java.net.ConnectException: Connection timed out
java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3492)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:838)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:753)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:374)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:624)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:851)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:903)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66)
at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:419)
at parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:238)
at parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:234)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/04/21 06:57:49 INFO DFSClient: Successfully connected to Node2 :1004 for BP-203832722-192.168.7.52-1369367151142:blk_1094311705_1099550253855
16/04/21 06:57:50 WARN BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3492)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:838)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:753)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:374)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:624)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:851)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:903)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66)
at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:419)
at parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:238)
at parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:234)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745). 

 

 

Even I can see some errors while executing simple hadoop commds  - 

 

$ hadoop fs -ls /user/testing
16/04/21 06:59:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/21 06:59:04 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 

 

Can someone help me in resolving this 

 

Thanks

Kishore

3 REPLIES 3

Re: BlockReaderFactory: I/O error constructing remote block reader in pyspark job

Forgot to add a point here .. I am I/O errors for parquet file formats where as json I am not getting errors. 

Re: BlockReaderFactory: I/O error constructing remote block reader in pyspark job

New Contributor

Im expericing the exact same issue -- did you ever find a resolution?

Re: BlockReaderFactory: I/O error constructing remote block reader in pyspark job

Not yet