- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Spark ML smoke test?
Created 08-16-2016 03:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
We have HDP 2.4.2 cluster configured with Spark. I did run smoke tests (spark PI, shell, Spark SQL) for various components. I am looking forward to a few smoke tests to prove that spark has been configured with ML libraries. Moreover, how to make sure that Spark ML configurations are optimized?
I was planning to run a couple of samples from https://spark.apache.org/docs/1.6.1/mllib-guide.html to make sure ML libs are configured. Is that enough?
Thanks,
SS
Created 08-22-2016 01:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Smart Solutions,
I think this would be sufficient to certify that the libraries are installed and your applications will be able to find them. You can find several examples that are ready to run under /usr/hdp/current/spark-client/examples/src/main/python/mllib. You can substitute python with your preferred language to find examples that correspond to the appropriate API.
In terms of optimized configurations, it is hard to tune that upfront as it will be highly dependent upon on your application, dataset, and cluster.
Created 08-22-2016 01:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Smart Solutions,
I think this would be sufficient to certify that the libraries are installed and your applications will be able to find them. You can find several examples that are ready to run under /usr/hdp/current/spark-client/examples/src/main/python/mllib. You can substitute python with your preferred language to find examples that correspond to the appropriate API.
In terms of optimized configurations, it is hard to tune that upfront as it will be highly dependent upon on your application, dataset, and cluster.