Reply
Expert Contributor
Posts: 253
Registered: ‎01-25-2017

Relation between open files for the spark job and the executor cores and memory

Hi Guys,

 

We have Spark job that we need to increase it's open files from 300 to 1200, what is the impact on the cores and memory that configured fro the job?

 

Is there any estimation equation of open files for the cores?

New Contributor
Posts: 1
Registered: ‎06-22-2017

Re: Relation between open files for the spark job and the executor cores and memory

I don't know for sure, but I'm not sure if it would have any notable impact on the core or memory configuration you have set for the driver or executors. Just out of curiosity, what is the need for controlling the max open files?
Expert Contributor
Posts: 253
Registered: ‎01-25-2017

Re: Relation between open files for the spark job and the executor cores and memory

Increasing parallism.

 

Posts: 630
Topics: 3
Kudos: 102
Solutions: 66
Registered: ‎08-16-2016

Re: Relation between open files for the spark job and the executor cores and memory

I think what you are looking for is the number of tasks that each executor can handle.  Tasks won't coorespond directly with files but  increasing the number of tasks per executor or increasing the number of executors will boost parallelism.

 

The number of cores per executor will determine the number of tasks that an executor can handle at a time.  It will wait until a tasks has finished before taking on a new one.  So you need to find the right number of cores that provide the best amount of tasks running on an executor with the right amount of memory per tasks to handle your data.  So lets say your job launches 1200 tasks.  You could configure it with 240 executors with 5 cores per executor.  This will allow all tasks to run in parallel at the same time.  If 240 is too much, drop it down but expect a slower run time for the job.

 

Tip:  I used 5 in the example as other have shown that more than five provides diminish returns of HDFS throughput.

Expert Contributor
Posts: 253
Registered: ‎01-25-2017

Re: Relation between open files for the spark job and the executor cores and memory

@mbigelow I read the best practise and fine tune my spark job, No i'm fine with executors and cores per executors and memory, i had a spark job that ran with 50X1X8G and has 600 open files and was running fine, when i increase the open files to 1200 the job start to fail, i'm all the time trying to find the suitable executors and cores per executors.

 

BTW, finidng the right configuration isn't a simple task and sometime i find my self doing try and catch.

Posts: 630
Topics: 3
Kudos: 102
Solutions: 66
Registered: ‎08-16-2016

Re: Relation between open files for the spark job and the executor cores and memory

Can you elaborate on what you did to "increase the open files to 1200"?
Announcements