Support Questions

smartninja723 · ‎08-16-2016

Hi all,

We have HDP 2.4.2 cluster configured with Spark. I did run smoke tests (spark PI, shell, Spark SQL) for various components. I am looking forward to a few smoke tests to prove that spark has been configured with ML libraries. Moreover, how to make sure that Spark ML configurations are optimized?

I was planning to run a couple of samples from https://spark.apache.org/docs/1.6.1/mllib-guide.html to make sure ML libs are configured. Is that enough?

Thanks,

SS

bwilson · ‎08-22-2016

Hi @Smart Solutions,

I think this would be sufficient to certify that the libraries are installed and your applications will be able to find them. You can find several examples that are ready to run under /usr/hdp/current/spark-client/examples/src/main/python/mllib. You can substitute python with your preferred language to find examples that correspond to the appropriate API.

In terms of optimized configurations, it is hard to tune that upfront as it will be highly dependent upon on your application, dataset, and cluster.

View solution in original post

bwilson · ‎08-22-2016

Hi @Smart Solutions,

I think this would be sufficient to certify that the libraries are installed and your applications will be able to find them. You can find several examples that are ready to run under /usr/hdp/current/spark-client/examples/src/main/python/mllib. You can substitute python with your preferred language to find examples that correspond to the appropriate API.

In terms of optimized configurations, it is hard to tune that upfront as it will be highly dependent upon on your application, dataset, and cluster.

Cloudera Community

Support Questions

Spark ML smoke test?