Member since
01-14-2015
23
Posts
3
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
43435 | 02-15-2017 08:14 AM |
02-15-2017
08:14 AM
It seems to be a library conflict issue between open source Spark 1.6.1 and Cloudera's Spark. I changed my POM file to use Spark version :- 1.6.0-cdh5.9.1 And now it is working fine. P.S :- If you run into the following error you might have "spark.shuffle.encryption.enabled" set to true. Caused by: java.lang.NullPointerException at com.intel.chimera.stream.CryptoOutputStream.(CryptoOutputStream.java:124) at com.intel.chimera.stream.CryptoOutputStream.(CryptoOutputStream.java:113) at com.intel.chimera.stream.CryptoOutputStream.(CryptoOutputStream.java:102) at com.intel.chimera.stream.CryptoOutputStream.(CryptoOutputStream.java:89) at org.apache.spark.crypto.CryptoStreamUtils$.createCryptoOutputStream(CryptoStreamUtils.scala:51) at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:104) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:229) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
... View more
01-30-2017
11:42 AM
We are using spark 1.6.1 on a CDH 5.5 cluster. The job worked fine with Kerberos but when we implemented Encryption at Rest we ran into the following issue:- Df.write().mode(SaveMode.Append).partitionBy("Partition").parquet(path); I have already tried setting these values with no success :- sparkContext.hadoopConfiguration().set("parquet.enable.summary-metadata", "true"/"false"); sparkContext.hadoopConfiguration().setInt("parquet.metadata.read.parallelism", 1); SparkConf.set("spark.sql.parquet.mergeSchema","false"); SparkConf.set("spark.sql.parquet.filterPushdown","true"); Ideally I would like to set summary-metadata to false as it will save sometime during the write. 17/01/30 18:37:54 WARN hadoop.ParquetOutputCommitter: could not write summary file for hdfs://abc java.io.IOException: Could not read footer: java.io.IOException: Could not read footer for file FileStatus{path=hdfs://abc/Partition=O/part-r-00003-95adb09f-627f-42fe-9b89-7631226e998f.gz.parquet; isDirectory=false; length=12775; replication=3; blocksize=134217728; modification_time=1485801467817; access_time=1485801467179; owner=bigdata-service; group=bigdata; permission=rw-rw----; isSymlink=false} at org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:247) at org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:262) at org.apache.parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:56) at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:149) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:106) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:106) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:106) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:256) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139) at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:334) at thomsonreuters.northstar.main.ParquetFileWriter.writeDataToParquet(ParquetFileWriter.java:173) at thomsonreuters.northstar.main.SparkProcessor.process(SparkProcessor.java:128) at thomsonreuters.northstar.main.NorthStarMain.main(NorthStarMain.java:129) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:558) Caused by: java.io.IOException: Could not read footer for file FileStatus{path=hdfs://abc/Partition=O/part-r-00003-95adb09f-627f-42fe-9b89-7631226e998f.gz.parquet; isDirectory=false; length=12775; replication=3; blocksize=134217728; modification_time=1485801467817; access_time=1485801467179; owner=bigdata-app-ooxp-service; group=bigdata; permission=rw-rw----; isSymlink=false} at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:239) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: can not read class org.apache.parquet.format.FileMetaData: Required field 'version' was not found in serialized data! Struct: FileMetaData(version:0, schema:null, num_rows:0, row_groups:null) at org.apache.parquet.format.Util.read(Util.java:216) at org.apache.parquet.format.Util.readFileMetaData(Util.java:73) at org.apache.parquet.format.converter.ParquetMetadataConverter$2.visit(ParquetMetadataConverter.java:515) at org.apache.parquet.format.converter.ParquetMetadataConverter$2.visit(ParquetMetadataConverter.java:512) at org.apache.parquet.format.converter.ParquetMetadataConverter$NoFilter.accept(ParquetMetadataConverter.java:433) at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:512) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:430) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237) ... 5 more Caused by: parquet.org.apache.thrift.protocol.TProtocolException: Required field 'version' was not found in serialized data! Struct: FileMetaData(version:0, schema:null, num_rows:0, row_groups:null) at org.apache.parquet.format.FileMetaData.read(FileMetaData.java:881) at org.apache.parquet.format.Util.read(Util.java:213) ... 12 more 17/01/30 18:37:54 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader. java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101) at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854) at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176) at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 17/01/30 18:37:54 WARN hdfs.DFSClient: Failed to connect to /10.51.29.22:1004 for block, add to deadNodes and continue. java.nio.channels.ClosedByInterruptException java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101) at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854) at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176) at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 17/01/30 18:37:54 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader. java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101) at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854) at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176) at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 17/01/30 18:37:54 WARN hdfs.DFSClient: Failed to connect to /10.51.29.217:1004 for block, add to deadNodes and continue. java.nio.channels.ClosedByInterruptException java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101) at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854) at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176) at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 17/01/30 18:37:54 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader. java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101) at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854) at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176) at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 17/01/30 18:37:54 WARN hdfs.DFSClient: Failed to connect to /10.51.29.218:1004 for block, add to deadNodes and continue. java.nio.channels.ClosedByInterruptException java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101) at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854) at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:176) at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:649) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:66) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
... View more
Labels:
- Labels:
-
Apache Spark
-
Kerberos
03-19-2015
07:52 PM
Apache just released 1.3.0 and I would like to install that on my CDH cluster. I know it will added to 5.4 CDH release but is there any other way to get it now? Or any other way to upgrade the existing SPARK standalone installation from 1.2.0 to 1.3.0.
... View more
01-15-2015
12:33 PM
I did add the hive.server2.authentication as NOSASL in /var/run/cloudera-scm-agent/process/168-hue-HUE_SERVER/hive-conf/hive-site.xml but for that to propagate I need to restart which creates a whole new Hue Server folder without those settings. Do you suggest any temporary fix for the time being? Anu
... View more
01-15-2015
11:43 AM
Hue version 3.7.0 Hive-site.xml -> http://www.hastebin.com/iquqemujan.xml Also, For some reason Hue does not copy the hive-site.xml specified. Instead it uses some default version, if that makes any sense. I am using the Couldera Quickstart VM version 5.3 Anu
... View more
01-14-2015
01:23 PM
Hue is pointing to the hive server as well as hive/conf folder. # Host where Hive server Thrift daemon is running. # If Kerberos security is enabled, use fully-qualified domain name (FQDN). hive_server_host=quickstart.cloudera # Port where HiveServer2 Thrift server runs on. hive_server_port=10000 # Hive configuration directory, where hive-site.xml is located #hive_conf_dir=/etc/hive/conf hive_conf_dir=/etc/hive/conf.cloudera.hive
... View more
01-14-2015
12:15 PM
Hi, We are using NOSASL setting for hive.server2.authentication. Impala editor on hue is able to access hive tables but hive editor does not work. It keeps on hanging with no error. Any ideas? Also, if we change hive.server2.authentication to default then hive editor works. It is definitely an authentication issue.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
-
Cloudera Hue