Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

write to a file from many server using Spark

write to a file from many server using Spark

New Contributor

 

i have few computers that combine together with cloudera.

i am using MobaXterm to connect the server.

i lunch my code from specipic server-computer (let said server 1)

i am trying to create a txt file that locate on server 1

and douring the code each server will write to the same file

so i did somting like that:

add the file

path = os.path.join("/home/user/","clousters.txt")

with open(path, "w") as testFile:

 

testFile.write("100")

sc.addFile(path)

then i have rdd which i send to a python function the python do alot of things- there are many sub functions

one of them get list of list and tring to write the output into the clousters txt file!

def processuser:

 

for enter in location_to_cluster_list:
with open(SparkFiles.get("clousters.txt")) as f:          writer = csv.writer(f)          writer.writerow(enter)

but i have that error : File not open for writing

i am using spark 1.3 with python 2.6

 

 

1 REPLY 1
Highlighted

Re: write to a file from many server using Spark

Expert Contributor

Spark's addFile is used to distribute read only files to workers and is not intended to be written to.  Also, using python file operators will write files locally and you will need to manually collect files if that is the desired result.  Instead, you can use a distributed file system, like HDFS, that is available to all workers.  Or collect results to the driver and write the file local to the driver if the results are sufficiently small.