question Spark ML smoke test? in Archives of Support Questions (Read Only)

Spark ML smoke test?

smartninja723 — Tue, 16 Aug 2016 22:06:42 GMT

Hi all,

We have HDP 2.4.2 cluster configured with Spark. I did run smoke tests (spark PI, shell, Spark SQL) for various components. I am looking forward to a few smoke tests to prove that spark has been configured with ML libraries. Moreover, how to make sure that Spark ML configurations are optimized?

I was planning to run a couple of samples from https://spark.apache.org/docs/1.6.1/mllib-guide.html to make sure ML libs are configured. Is that enough?

Thanks,

Re: Spark ML smoke test?

bwilson — Mon, 22 Aug 2016 08:15:05 GMT

Hi @Smart Solutions,

I think this would be sufficient to certify that the libraries are installed and your applications will be able to find them. You can find several examples that are ready to run under /usr/hdp/current/spark-client/examples/src/main/python/mllib. You can substitute python with your preferred language to find examples that correspond to the appropriate API.

In terms of optimized configurations, it is hard to tune that upfront as it will be highly dependent upon on your application, dataset, and cluster.