Support Questions

Find answers, ask questions, and share your expertise

Schedule same python script 100 times at a time,Schedule python script 100 times at a time

New Contributor

I have an Python scripts which will take application ID as parameter and download Json file from web.

here is the scnario would like to explore. I got 6,000,000 application Ids so I would like to execute this python scripts 6 milion times,

we are in Spark 1.6.2, Python 2.7.5 and NiFi as scheduling tool

Please let me know what would be ideal solutions for my use case


Expert Contributor

@Nara g,

There are not enough details to suggest you on "ideal" solution. You can play with different number of threads for HTTP connection to pull the data from web, and for executing your python script. It depends on resources required to execute each script, and resources available on your Nifi edge node or cluster (in case it is pyspark jobs submitted in yarn mode).

But I would suggest:

1. Get all JSON docs from web and store them in HDFS (no dependency on processing, and will allow you to reprocess if bug is found and data should be reprocessed)

2. Execute single python/spark job on all of them in YARN mode.