11-16-2018 01:48 AM
I'm writing a report. Here to get some input from the data science community regarding the Cloudera Platform if you don't mind :)
From what I gather of Cloudera's products from reading its website, Cloudera seems to be mainly offering a user interface for businesses to more easily handle and also maybe automate their data science tasks. It seems to mainly use Apache Hadoop as its processing engine, but also offers the Apache Spark as an added component for other tasks, e.g. Spark Streaming, Spark ML etc.
Besides being convenient for its clients in meeting their data science needs with its ready made interface, does Cloudera offer any advantage that the Hadoop and Spark cannot offer on their own?
I want to ask ...
What is better about it compared to Hadoop and Spark?
What are its main advantages?
Anything special/custom features about it?
All opinions are welcome. Thanks in advance :)
11-18-2018 03:00 AM
Cloudera is offering the exact same Hadoop and Spark as you can download them directly. Maybe a bit behind the latest version with some small modifications, but nothing we would notice.
The value is always in managed services. For instance, what is the difference between Hadoop cluster vanilla (you install and manage yourself manually) and Cloudera Manager? It makes your life easier! Same goes to Data Science, they manage things you shouldn't spend time and efforts to do it yourself so you can spend time and energy on what matters the most, your job as a data scientist. This is the same for most of the products and services being managed in the Cloud or being offered with some management UI. All for you to spend more time playing with it rather than figuring out how to make it work.
Hope this helps,
PS: You really need to install the quick start VM and play around, it's hard to know what exactly they offer and its value without having a demo.