Support Questions

Find answers, ask questions, and share your expertise

Delay with Spark application

avatar
New Contributor

Hello everyone,

in the execution of a Spark script, which invokes several applications, we have different timings in each of the runs performed, in fact it went from about 8h on 18/11/2023, to about 17h on 09/12/2023 to about 33h on 02/12/2023.

Since this is a very heavy process, all daily schedules were disabled to give the application a chance to allocate all resources.
The application is run sporadically during weekends, and we do not know exactly when the next run will be.

We have noticed that when running Spark applications that are part of the script, a delay of about 3 to 4 hours is encountered between the execution of one job and its next.

Do you have any suggestions regarding a possible cause of the anomaly encountered? Thank you

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Hi @Nardoleo 

It is very difficult to provide answer here because we don't know what each application is doing.

You can do one thing, check each application when it is started and when it is ended. In between you can check the how much data you are processing, in which spark job is taking more time and you can try to tune resources, apply some kind of code optimizations. 

View solution in original post

4 REPLIES 4

avatar
Community Manager

@Nardoleo, Welcome to our community! To help you get the best possible answer, I have tagged our Spark experts  @RangaReddy @Babasaheb @ggangadharan who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Master Collaborator

Hi @Nardoleo 

It is very difficult to provide answer here because we don't know what each application is doing.

You can do one thing, check each application when it is started and when it is ended. In between you can check the how much data you are processing, in which spark job is taking more time and you can try to tune resources, apply some kind of code optimizations. 

avatar
Contributor
Hi @Nardoleo, There could be a few issues that can cause delays to jobs, Check the queue you are using to submit the jobs for enough sources. Check the data size and Data skewness which can also lead to delay in job.  Also, the network needs to be monitored if that is causing a delay.

 

avatar
Community Manager

@Nardoleo, Did any of the responses assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future. 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: