Support Questions

Find answers, ask questions, and share your expertise

Spark ML smoke test?

avatar
Expert Contributor

Hi all,

We have HDP 2.4.2 cluster configured with Spark. I did run smoke tests (spark PI, shell, Spark SQL) for various components. I am looking forward to a few smoke tests to prove that spark has been configured with ML libraries. Moreover, how to make sure that Spark ML configurations are optimized?

I was planning to run a couple of samples from https://spark.apache.org/docs/1.6.1/mllib-guide.html to make sure ML libs are configured. Is that enough?

Thanks,

SS

1 ACCEPTED SOLUTION

avatar

Hi @Smart Solutions,

I think this would be sufficient to certify that the libraries are installed and your applications will be able to find them. You can find several examples that are ready to run under /usr/hdp/current/spark-client/examples/src/main/python/mllib. You can substitute python with your preferred language to find examples that correspond to the appropriate API.

In terms of optimized configurations, it is hard to tune that upfront as it will be highly dependent upon on your application, dataset, and cluster.

View solution in original post

1 REPLY 1

avatar

Hi @Smart Solutions,

I think this would be sufficient to certify that the libraries are installed and your applications will be able to find them. You can find several examples that are ready to run under /usr/hdp/current/spark-client/examples/src/main/python/mllib. You can substitute python with your preferred language to find examples that correspond to the appropriate API.

In terms of optimized configurations, it is hard to tune that upfront as it will be highly dependent upon on your application, dataset, and cluster.