Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark Machine Learning for performance prediction

avatar
Expert Contributor

Hi

I am totally new to SparkML. I capture the batch processing information for Spark Streaming and write it to file. I capture the following information per batch

(FYI each batch in spark is a jobset which means it is a set of jobs.)

BatchTime

BatchStarted

FirstJobStartTime

LastJobCompletionTime

FirstJobSchedulingDelay

TotalJobProcessingTime (time to process all jobs in a batch)

NumberOfRecords

SubmissionTime

TotalDelay (Total execution time for a batch from the time it is submitted, scheduled and processed.)

Lets say I want to make a prediction against what will be the total delay when the number of records are X in a batch. Can anyone suggest what machine learning algorithm will be applicable in this scenario (linear regression, classification etc)?

Of course the most important parameters would be scheduling delay, total delay and number of records and Job processing time.

Thanks

1 ACCEPTED SOLUTION

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
2 REPLIES 2

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Expert Contributor

@jfrazee thanks for the reply. I am using spark streaming which processes data in batches. I want to know how long does it take to process a batch for a given application (keeping the factors like number of nodes in the cluster constant) at a given data rate (records/batch). I eventually want to check an SLA to make sure that the end to end delay would still fall within the SLA, therefore I want to gather historic data from the application runs and make predictions for the time to process a batch. before starting a new batch you can already make a prediction whether it would voilate the SLA. I will have a look into your suggestions.

Thanks