<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Saving parquet file in Spark giving error in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-parquet-file-in-Spark-giving-error/m-p/50159#M52916</link>
    <description>&lt;P&gt;We are using spark 1.6.1 on a CDH 5.5 cluster. The job worked fine with Kerberos but when we implemented Encryption at Rest we ran into the following issue:-&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Df.write().mode(SaveMode.Append).partitionBy("Partition").parquet(path);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have already tried setting these values with no success :-&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;sparkContext.hadoopConfiguration().set("parquet.enable.summary-metadata", "true"/"false");&lt;BR /&gt;sparkContext.hadoopConfiguration().setInt("parquet.metadata.read.parallelism", 1);&lt;/P&gt;&lt;P&gt;SparkConf.set("spark.sql.parquet.mergeSchema","false");&lt;BR /&gt;SparkConf.set("spark.sql.parquet.filterPushdown","true");&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Ideally I would like to set summary-metadata to false as it will save sometime during the write.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;17/01/30 18:37:54 WARN hadoop.ParquetOutputCommitter: could not write summary file for hdfs://abc&lt;BR /&gt;java.io.IOException: Could not read footer: java.io.IOException: Could not read footer for file FileStatus{path=hdfs://abc/Partition=O/part-r-00003-95adb09f-627f-42fe-9b89-7631226e998f.gz.parquet; isDirectory=false; length=12775; replication=3; blocksize=134217728; modification_time=1485801467817; access_time=1485801467179; owner=bigdata-service; group=bigdata; permission=rw-rw----; isSymlink=false}&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:247)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:262)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:56)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:149)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:106)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:106)&lt;BR /&gt;at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:106)&lt;BR /&gt;at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)&lt;BR /&gt;at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)&lt;BR /&gt;at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)&lt;BR /&gt;at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)&lt;BR /&gt;at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)&lt;BR /&gt;at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)&lt;BR /&gt;at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)&lt;BR /&gt;at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:256)&lt;BR /&gt;at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)&lt;BR /&gt;at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139)&lt;BR /&gt;at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:334)&lt;BR /&gt;at thomsonreuters.northstar.main.ParquetFileWriter.writeDataToParquet(ParquetFileWriter.java:173)&lt;BR /&gt;at thomsonreuters.northstar.main.SparkProcessor.process(SparkProcessor.java:128)&lt;BR /&gt;at thomsonreuters.northstar.main.NorthStarMain.main(NorthStarMain.java:129)&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)&lt;BR /&gt;at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)&lt;BR /&gt;at java.lang.reflect.Method.invoke(Method.java:497)&lt;BR /&gt;at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:558)&lt;BR /&gt;Caused by: java.io.IOException: Could not read footer for file FileStatus{path=hdfs://abc/Partition=O/part-r-00003-95adb09f-627f-42fe-9b89-7631226e998f.gz.parquet; isDirectory=false; length=12775; replication=3; blocksize=134217728; modification_time=1485801467817; access_time=1485801467179; owner=bigdata-app-ooxp-service; group=bigdata; permission=rw-rw----; isSymlink=false}&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:239)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233)&lt;BR /&gt;at java.util.concurrent.FutureTask.run(FutureTask.java:266)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;Caused by: java.io.IOException: can not read class org.apache.parquet.format.FileMetaData: Required field 'version' was not found in serialized data! Struct: FileMetaData(version:0, schema:null, num_rows:0, row_groups:null)&lt;BR /&gt;at org.apache.parquet.format.Util.read(Util.java:216)&lt;BR /&gt;at org.apache.parquet.format.Util.readFileMetaData(Util.java:73)&lt;BR /&gt;at org.apache.parquet.format.converter.ParquetMetadataConverter$2.visit(ParquetMetadataConverter.java:515)&lt;BR /&gt;at org.apache.parquet.format.converter.ParquetMetadataConverter$2.visit(ParquetMetadataConverter.java:512)&lt;BR /&gt;at org.apache.parquet.format.converter.ParquetMetadataConverter$NoFilter.accept(ParquetMetadataConverter.java:433)&lt;BR /&gt;at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:512)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:430)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237)&lt;BR /&gt;... 5 more&lt;BR /&gt;Caused by: parquet.org.apache.thrift.protocol.TProtocolException: Required field 'version' was not found in serialized data! Struct: FileMetaData(version:0, schema:null, num_rows:0, row_groups:null)&lt;BR /&gt;at org.apache.parquet.format.FileMetaData.read(FileMetaData.java:881)&lt;BR /&gt;at org.apache.parquet.format.Util.read(Util.java:213)&lt;BR /&gt;... 12 more&lt;BR /&gt;17/01/30 18:37:54 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.&lt;BR /&gt;java.nio.channels.ClosedByInterruptException&lt;BR /&gt;at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)&lt;BR /&gt;at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659)&lt;BR /&gt;at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)&lt;BR /&gt;at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649)&lt;BR /&gt;at java.io.FilterInputStream.read(FilterInputStream.java:83)&lt;BR /&gt;at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233)&lt;BR /&gt;at java.util.concurrent.FutureTask.run(FutureTask.java:266)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;17/01/30 18:37:54 WARN hdfs.DFSClient: Failed to connect to /10.51.29.22:1004 for block, add to deadNodes and continue. java.nio.channels.ClosedByInterruptException&lt;BR /&gt;java.nio.channels.ClosedByInterruptException&lt;BR /&gt;at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)&lt;BR /&gt;at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659)&lt;BR /&gt;at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)&lt;BR /&gt;at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649)&lt;BR /&gt;at java.io.FilterInputStream.read(FilterInputStream.java:83)&lt;BR /&gt;at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233)&lt;BR /&gt;at java.util.concurrent.FutureTask.run(FutureTask.java:266)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;17/01/30 18:37:54 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.&lt;BR /&gt;java.nio.channels.ClosedByInterruptException&lt;BR /&gt;at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)&lt;BR /&gt;at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659)&lt;BR /&gt;at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)&lt;BR /&gt;at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649)&lt;BR /&gt;at java.io.FilterInputStream.read(FilterInputStream.java:83)&lt;BR /&gt;at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233)&lt;BR /&gt;at java.util.concurrent.FutureTask.run(FutureTask.java:266)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;17/01/30 18:37:54 WARN hdfs.DFSClient: Failed to connect to /10.51.29.217:1004 for block, add to deadNodes and continue. java.nio.channels.ClosedByInterruptException&lt;BR /&gt;java.nio.channels.ClosedByInterruptException&lt;BR /&gt;at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)&lt;BR /&gt;at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659)&lt;BR /&gt;at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)&lt;BR /&gt;at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649)&lt;BR /&gt;at java.io.FilterInputStream.read(FilterInputStream.java:83)&lt;BR /&gt;at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233)&lt;BR /&gt;at java.util.concurrent.FutureTask.run(FutureTask.java:266)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;17/01/30 18:37:54 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.&lt;BR /&gt;java.nio.channels.ClosedByInterruptException&lt;BR /&gt;at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)&lt;BR /&gt;at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659)&lt;BR /&gt;at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)&lt;BR /&gt;at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649)&lt;BR /&gt;at java.io.FilterInputStream.read(FilterInputStream.java:83)&lt;BR /&gt;at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233)&lt;BR /&gt;at java.util.concurrent.FutureTask.run(FutureTask.java:266)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;17/01/30 18:37:54 WARN hdfs.DFSClient: Failed to connect to /10.51.29.218:1004 for block, add to deadNodes and continue. java.nio.channels.ClosedByInterruptException&lt;BR /&gt;java.nio.channels.ClosedByInterruptException&lt;BR /&gt;at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)&lt;BR /&gt;at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659)&lt;BR /&gt;at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)&lt;BR /&gt;at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649)&lt;BR /&gt;at java.io.FilterInputStream.read(FilterInputStream.java:83)&lt;BR /&gt;at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233)&lt;BR /&gt;at java.util.concurrent.FutureTask.run(FutureTask.java:266)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 10:58:58 GMT</pubDate>
    <dc:creator>morfious902002</dc:creator>
    <dc:date>2022-09-16T10:58:58Z</dc:date>
    <item>
      <title>Saving parquet file in Spark giving error</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-parquet-file-in-Spark-giving-error/m-p/50159#M52916</link>
      <description>&lt;P&gt;We are using spark 1.6.1 on a CDH 5.5 cluster. The job worked fine with Kerberos but when we implemented Encryption at Rest we ran into the following issue:-&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Df.write().mode(SaveMode.Append).partitionBy("Partition").parquet(path);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have already tried setting these values with no success :-&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;sparkContext.hadoopConfiguration().set("parquet.enable.summary-metadata", "true"/"false");&lt;BR /&gt;sparkContext.hadoopConfiguration().setInt("parquet.metadata.read.parallelism", 1);&lt;/P&gt;&lt;P&gt;SparkConf.set("spark.sql.parquet.mergeSchema","false");&lt;BR /&gt;SparkConf.set("spark.sql.parquet.filterPushdown","true");&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Ideally I would like to set summary-metadata to false as it will save sometime during the write.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;17/01/30 18:37:54 WARN hadoop.ParquetOutputCommitter: could not write summary file for hdfs://abc&lt;BR /&gt;java.io.IOException: Could not read footer: java.io.IOException: Could not read footer for file FileStatus{path=hdfs://abc/Partition=O/part-r-00003-95adb09f-627f-42fe-9b89-7631226e998f.gz.parquet; isDirectory=false; length=12775; replication=3; blocksize=134217728; modification_time=1485801467817; access_time=1485801467179; owner=bigdata-service; group=bigdata; permission=rw-rw----; isSymlink=false}&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:247)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:262)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:56)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:149)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:106)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:106)&lt;BR /&gt;at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:106)&lt;BR /&gt;at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)&lt;BR /&gt;at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)&lt;BR /&gt;at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)&lt;BR /&gt;at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)&lt;BR /&gt;at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)&lt;BR /&gt;at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)&lt;BR /&gt;at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)&lt;BR /&gt;at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:256)&lt;BR /&gt;at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)&lt;BR /&gt;at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139)&lt;BR /&gt;at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:334)&lt;BR /&gt;at thomsonreuters.northstar.main.ParquetFileWriter.writeDataToParquet(ParquetFileWriter.java:173)&lt;BR /&gt;at thomsonreuters.northstar.main.SparkProcessor.process(SparkProcessor.java:128)&lt;BR /&gt;at thomsonreuters.northstar.main.NorthStarMain.main(NorthStarMain.java:129)&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)&lt;BR /&gt;at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)&lt;BR /&gt;at java.lang.reflect.Method.invoke(Method.java:497)&lt;BR /&gt;at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:558)&lt;BR /&gt;Caused by: java.io.IOException: Could not read footer for file FileStatus{path=hdfs://abc/Partition=O/part-r-00003-95adb09f-627f-42fe-9b89-7631226e998f.gz.parquet; isDirectory=false; length=12775; replication=3; blocksize=134217728; modification_time=1485801467817; access_time=1485801467179; owner=bigdata-app-ooxp-service; group=bigdata; permission=rw-rw----; isSymlink=false}&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:239)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233)&lt;BR /&gt;at java.util.concurrent.FutureTask.run(FutureTask.java:266)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;Caused by: java.io.IOException: can not read class org.apache.parquet.format.FileMetaData: Required field 'version' was not found in serialized data! Struct: FileMetaData(version:0, schema:null, num_rows:0, row_groups:null)&lt;BR /&gt;at org.apache.parquet.format.Util.read(Util.java:216)&lt;BR /&gt;at org.apache.parquet.format.Util.readFileMetaData(Util.java:73)&lt;BR /&gt;at org.apache.parquet.format.converter.ParquetMetadataConverter$2.visit(ParquetMetadataConverter.java:515)&lt;BR /&gt;at org.apache.parquet.format.converter.ParquetMetadataConverter$2.visit(ParquetMetadataConverter.java:512)&lt;BR /&gt;at org.apache.parquet.format.converter.ParquetMetadataConverter$NoFilter.accept(ParquetMetadataConverter.java:433)&lt;BR /&gt;at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:512)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:430)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237)&lt;BR /&gt;... 5 more&lt;BR /&gt;Caused by: parquet.org.apache.thrift.protocol.TProtocolException: Required field 'version' was not found in serialized data! Struct: FileMetaData(version:0, schema:null, num_rows:0, row_groups:null)&lt;BR /&gt;at org.apache.parquet.format.FileMetaData.read(FileMetaData.java:881)&lt;BR /&gt;at org.apache.parquet.format.Util.read(Util.java:213)&lt;BR /&gt;... 12 more&lt;BR /&gt;17/01/30 18:37:54 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.&lt;BR /&gt;java.nio.channels.ClosedByInterruptException&lt;BR /&gt;at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)&lt;BR /&gt;at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659)&lt;BR /&gt;at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)&lt;BR /&gt;at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649)&lt;BR /&gt;at java.io.FilterInputStream.read(FilterInputStream.java:83)&lt;BR /&gt;at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233)&lt;BR /&gt;at java.util.concurrent.FutureTask.run(FutureTask.java:266)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;17/01/30 18:37:54 WARN hdfs.DFSClient: Failed to connect to /10.51.29.22:1004 for block, add to deadNodes and continue. java.nio.channels.ClosedByInterruptException&lt;BR /&gt;java.nio.channels.ClosedByInterruptException&lt;BR /&gt;at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)&lt;BR /&gt;at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659)&lt;BR /&gt;at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)&lt;BR /&gt;at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649)&lt;BR /&gt;at java.io.FilterInputStream.read(FilterInputStream.java:83)&lt;BR /&gt;at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233)&lt;BR /&gt;at java.util.concurrent.FutureTask.run(FutureTask.java:266)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;17/01/30 18:37:54 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.&lt;BR /&gt;java.nio.channels.ClosedByInterruptException&lt;BR /&gt;at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)&lt;BR /&gt;at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659)&lt;BR /&gt;at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)&lt;BR /&gt;at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649)&lt;BR /&gt;at java.io.FilterInputStream.read(FilterInputStream.java:83)&lt;BR /&gt;at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233)&lt;BR /&gt;at java.util.concurrent.FutureTask.run(FutureTask.java:266)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;17/01/30 18:37:54 WARN hdfs.DFSClient: Failed to connect to /10.51.29.217:1004 for block, add to deadNodes and continue. java.nio.channels.ClosedByInterruptException&lt;BR /&gt;java.nio.channels.ClosedByInterruptException&lt;BR /&gt;at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)&lt;BR /&gt;at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659)&lt;BR /&gt;at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)&lt;BR /&gt;at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649)&lt;BR /&gt;at java.io.FilterInputStream.read(FilterInputStream.java:83)&lt;BR /&gt;at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233)&lt;BR /&gt;at java.util.concurrent.FutureTask.run(FutureTask.java:266)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;17/01/30 18:37:54 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.&lt;BR /&gt;java.nio.channels.ClosedByInterruptException&lt;BR /&gt;at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)&lt;BR /&gt;at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659)&lt;BR /&gt;at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)&lt;BR /&gt;at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649)&lt;BR /&gt;at java.io.FilterInputStream.read(FilterInputStream.java:83)&lt;BR /&gt;at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233)&lt;BR /&gt;at java.util.concurrent.FutureTask.run(FutureTask.java:266)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;17/01/30 18:37:54 WARN hdfs.DFSClient: Failed to connect to /10.51.29.218:1004 for block, add to deadNodes and continue. java.nio.channels.ClosedByInterruptException&lt;BR /&gt;java.nio.channels.ClosedByInterruptException&lt;BR /&gt;at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)&lt;BR /&gt;at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659)&lt;BR /&gt;at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)&lt;BR /&gt;at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)&lt;BR /&gt;at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)&lt;BR /&gt;at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176)&lt;BR /&gt;at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649)&lt;BR /&gt;at java.io.FilterInputStream.read(FilterInputStream.java:83)&lt;BR /&gt;at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237)&lt;BR /&gt;at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233)&lt;BR /&gt;at java.util.concurrent.FutureTask.run(FutureTask.java:266)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:58:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-parquet-file-in-Spark-giving-error/m-p/50159#M52916</guid>
      <dc:creator>morfious902002</dc:creator>
      <dc:date>2022-09-16T10:58:58Z</dc:date>
    </item>
    <item>
      <title>Re: Saving parquet file in Spark giving error</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-parquet-file-in-Spark-giving-error/m-p/50961#M52917</link>
      <description>&lt;P&gt;It seems to be a library conflict issue between open source Spark 1.6.1 and Cloudera's Spark.&lt;/P&gt;&lt;P&gt;I changed my POM file to use&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; Spark version :- 1.6.0-cdh5.9.1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And now it is working fine.&lt;/P&gt;&lt;P&gt;P.S :- If you run into the following error you might have "spark.shuffle.encryption.enabled" set to true.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; Caused by: java.lang.NullPointerException at com.intel.chimera.stream.CryptoOutputStream.(CryptoOutputStream.java:124) at com.intel.chimera.stream.CryptoOutputStream.(CryptoOutputStream.java:113) at com.intel.chimera.stream.CryptoOutputStream.(CryptoOutputStream.java:102) at com.intel.chimera.stream.CryptoOutputStream.(CryptoOutputStream.java:89) at org.apache.spark.crypto.CryptoStreamUtils$.createCryptoOutputStream(CryptoStreamUtils.scala:51) at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:104) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:229) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;</description>
      <pubDate>Wed, 15 Feb 2017 16:14:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-parquet-file-in-Spark-giving-error/m-p/50961#M52917</guid>
      <dc:creator>morfious902002</dc:creator>
      <dc:date>2017-02-15T16:14:53Z</dc:date>
    </item>
  </channel>
</rss>

