I have an Python scripts which will take application ID as parameter and download Json file from web.
here is the scnario would like to explore.
I got 6,000,000 application Ids so I would like to execute this python scripts 6 milion times,
we are in Spark 1.6.2, Python 2.7.5 and NiFi as scheduling tool
Please let me know what would be ideal solutions for my use case