Reply
Highlighted
Contributor
Posts: 69
Registered: ‎01-24-2017

Writing from Spark to a shared file system

Can a spark job running under yarn write a file not to HDFS (that works fine) but to a shared file system (we use GPFS but I doubt it matters). So far I could not make it work. 

 

The command that fails is:

 

ts.saveAsTextFile("file:///home/me/z11")

 

Notice that /home/me is mounted on all the nodes of the Hadoop cluster.

 

The error that I am getting is:

============

at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Mkdirs failed to create file:/home/me/z11/_temporary/0/_temporary/attempt_201704290002_0002_m_000000_15 (exists=false, cwd=file:/data/6/yarn/nm/usercache/ivy2/appcache/application_1490816225123_1660/container_e04_1490816225123_1660_01_000002)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:447)

============

 

The empty directory /home/me/z11/_temporary/0/ was created but that's all.

 

Contributor
Posts: 25
Registered: ‎06-13-2017

Re: Writing from Spark to a shared file system

You might have to include your GPFS libraries to your SPARK_CLASSPATH and LD_LIBRARY_PATH