Created on 03-31-2017 02:35 PM - edited 09-16-2022 04:23 AM
I am attempting leverage Spark2 to write a parquet file to HDFS, but am receiving the following error:
Error summary: RemoteException: File /user/cloudera/1000genome/processed/test.vds/rdd.parquet/_temporary/0/_temporary/attempt_201703311418_0001_m_000000_3/part-r-00000-c0069d7a-101f-4bf9-9dc9-22b362285b12.snappy.parquet could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1622)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3351)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:683)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:214)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:495)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)
I had initially started with the Cloudera Quickstart VM, upgraded Cloudera Manager and CDH to 5.10, and then installed Spark2. Is there something I'm missing that would prevent Spark2 from writing to HDFS? As well, Spark2 is unable to write this file locally, instead receiving an IO Exception Error, but has no problems reading the file.
I am calling my Spark Script with the below line:
spark2-submit --master yarn-client tutorial.py
Created 03-31-2017 02:39 PM
It has nothing to do with Spark. This is the kind of error you get when HDFS is not working. It may be unable to start, still starting, or having some other problem.
Created 03-31-2017 02:49 PM
Odd, hdfs dfs -ls etc. all seem to be working. As well, if I run through the "Getting Started" tutorial I don't seem to encounter any issues.
In regards to the IOException when attempting to write to disk (see below), is this just tied to the user behind Spark2 not having write priviledges to that location?
Error summary: IOException: Mkdirs failed to create file:/home/cloudera/Documents/hail-workspace/source/out.vds/rdd.parquet/_temporary/0/_temporary/attempt_201703311444_0001_m_000000_3