Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark2 Unable to write to HDFS (Or Local)

avatar
Rising Star

I am attempting leverage Spark2 to write a parquet file to HDFS, but am receiving the following error:

 

Error summary: RemoteException: File /user/cloudera/1000genome/processed/test.vds/rdd.parquet/_temporary/0/_temporary/attempt_201703311418_0001_m_000000_3/part-r-00000-c0069d7a-101f-4bf9-9dc9-22b362285b12.snappy.parquet could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1622)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3351)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:683)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:214)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:495)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)

 

I had initially started with the Cloudera Quickstart VM, upgraded Cloudera Manager and CDH to 5.10, and then installed Spark2. Is there something I'm missing that would prevent Spark2 from writing to HDFS? As well, Spark2 is unable to write this file locally, instead receiving an IO Exception Error, but has no problems reading the file.

I am calling my Spark Script with the below line:

 

spark2-submit --master yarn-client tutorial.py

2 REPLIES 2

avatar
Master Collaborator

It has nothing to do with Spark. This is the kind of error you get when HDFS is not working. It may be unable to start, still starting, or having some other problem.

avatar
Rising Star

Odd, hdfs dfs -ls etc. all seem to be working. As well, if I run through the "Getting Started" tutorial I don't seem to encounter any issues.

 

In regards to the IOException when attempting to write to disk (see below), is this just tied to the user behind Spark2 not having write priviledges to that location?

 

Error summary: IOException: Mkdirs failed to create file:/home/cloudera/Documents/hail-workspace/source/out.vds/rdd.parquet/_temporary/0/_temporary/attempt_201703311444_0001_m_000000_3