Reply
New Contributor
Posts: 2
Registered: ‎03-31-2017

Spark2 Unable to write to HDFS (Or Local)

I am attempting leverage Spark2 to write a parquet file to HDFS, but am receiving the following error:

 

Error summary: RemoteException: File /user/cloudera/1000genome/processed/test.vds/rdd.parquet/_temporary/0/_temporary/attempt_201703311418_0001_m_000000_3/part-r-00000-c0069d7a-101f-4bf9-9dc9-22b362285b12.snappy.parquet could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1622)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3351)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:683)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:214)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:495)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)

 

I had initially started with the Cloudera Quickstart VM, upgraded Cloudera Manager and CDH to 5.10, and then installed Spark2. Is there something I'm missing that would prevent Spark2 from writing to HDFS? As well, Spark2 is unable to write this file locally, instead receiving an IO Exception Error, but has no problems reading the file.

I am calling my Spark Script with the below line:

 

spark2-submit --master yarn-client tutorial.py

Cloudera Employee
Posts: 418
Registered: ‎08-11-2014

Re: Spark2 Unable to write to HDFS (Or Local)

It has nothing to do with Spark. This is the kind of error you get when HDFS is not working. It may be unable to start, still starting, or having some other problem.

New Contributor
Posts: 2
Registered: ‎03-31-2017

Re: Spark2 Unable to write to HDFS (Or Local)

Odd, hdfs dfs -ls etc. all seem to be working. As well, if I run through the "Getting Started" tutorial I don't seem to encounter any issues.

 

In regards to the IOException when attempting to write to disk (see below), is this just tied to the user behind Spark2 not having write priviledges to that location?

 

Error summary: IOException: Mkdirs failed to create file:/home/cloudera/Documents/hail-workspace/source/out.vds/rdd.parquet/_temporary/0/_temporary/attempt_201703311444_0001_m_000000_3

Announcements