Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Java or Scala or Python on Spark

Explorer

Hi,

 

 Could anyone tell me which language is good to work on Spark, scala or java or python and also tell me why?

 

Ravi

1 ACCEPTED SOLUTION

Master Collaborator

Scala is the native language of Spark. All else equal, it will be easiest to use Spark in Scala. However, of course, not everyone knows Scala or is using it in other projects.

 

Using it from Java is only slightly less convenient. You will write more code since Java's handling of anonymous classes is quite verbose before Java 8. All of the Scala APIs can be called from Java too, although some look weird when accessed from Java. Most APIs have a Java-friendlier version where necessary to ease this integration.

 

Python is probably the least easy to use since it is not JVM-based. There is a runtime overhead to translating back and forth between Spark and Python. Not all APIs are 'translated' to Python. Still, it works, and is useful if, well, you know Python and want to use it.

View solution in original post

4 REPLIES 4

Master Collaborator

Scala is the native language of Spark. All else equal, it will be easiest to use Spark in Scala. However, of course, not everyone knows Scala or is using it in other projects.

 

Using it from Java is only slightly less convenient. You will write more code since Java's handling of anonymous classes is quite verbose before Java 8. All of the Scala APIs can be called from Java too, although some look weird when accessed from Java. Most APIs have a Java-friendlier version where necessary to ease this integration.

 

Python is probably the least easy to use since it is not JVM-based. There is a runtime overhead to translating back and forth between Spark and Python. Not all APIs are 'translated' to Python. Still, it works, and is useful if, well, you know Python and want to use it.

Explorer

Thanks a lot of the reply. Could you please tell me the best material to start leanring scala and spark?

Master Collaborator

There is a Coursera course on Scala right now -- you can still watch the videos although it started weeks ago: https://www.coursera.org/course/progfun

 

There are a number of examples and tutorials on the web concerning Spark. Really, take your pick after searching Google. Here's a blog post I wrote with a quick example: http://blog.cloudera.com/blog/2014/03/why-apache-spark-is-a-crossover-hit-for-data-scientists/

Explorer

Thanks a lot...