Hi,
I am facing a scenario where I am receiving 2500 files (in parquet format) of same structure on daily basis . I have to process all these files in a parallel fashion in py-spark. What is the best approach I should use to make sure that all the files are processed in parallel fashion?