Reply
Highlighted
New Contributor
Posts: 2
Registered: ‎01-28-2017

write to a file from many server using Spark

[ Edited ]

 

i have few computers that combine together with cloudera.

i am using MobaXterm to connect the server.

i lunch my code from specipic server-computer (let said server 1)

i am trying to create a txt file that locate on server 1

and douring the code each server will write to the same file

so i did somting like that:

add the file

path = os.path.join("/home/user/","clousters.txt")

with open(path, "w") as testFile:

 

testFile.write("100")

sc.addFile(path)

then i have rdd which i send to a python function the python do alot of things- there are many sub functions

one of them get list of list and tring to write the output into the clousters txt file!

def processuser:

 

for enter in location_to_cluster_list:
with open(SparkFiles.get("clousters.txt")) as f:          writer = csv.writer(f)          writer.writerow(enter)

but i have that error : File not open for writing

i am using spark 1.3 with python 2.6

 

 

Cloudera Employee
Posts: 94
Registered: ‎05-10-2016

Re: write to a file from many server using Spark

Spark's addFile is used to distribute read only files to workers and is not intended to be written to.  Also, using python file operators will write files locally and you will need to manually collect files if that is the desired result.  Instead, you can use a distributed file system, like HDFS, that is available to all workers.  Or collect results to the driver and write the file local to the driver if the results are sufficiently small.

Announcements