@Arsalan Siddiqi The standard answer for delay modeling is to model the delay times using an exponential distribution. There's a an analytical Bayesian solution (i.e., no MCMC) to this or you can use GeneralizedLinearRegression from MLlib with the "gamma" family (since the exponential is a special case of the gamma with alpha = 1).
There's probably an alternative way to think about the problem in terms of the number of delayed batches, which could be analyzed using poissons. Without knowing more about the goals it's hard to say what makes more sense.
You can always squint at something and turn it into a classification problem (different classes or buckets for different ranges of data; i.e., quantizing), but there is an order to things like delays and total counts, and your standard classification regimes don't take this into account (there is an ordered logit, but AFAICT MLlib doesn't have this out of the box). That said, such an approach often works (what counts as adequate performance is an empirical matter); so if the aforementioned approach is beyond your current reach, classification could be acceptable since for a business application (vs. scientific inquiry) it's as important to understand what you've done as it is to be correct.
Note: For these regression approaches using non-gaussian error distributions, you will likely need to transform the fitted parameters for them to be interpretable.