Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Java or Scala or Python on Spark

avatar
Explorer

Hi,

 

 Could anyone tell me which language is good to work on Spark, scala or java or python and also tell me why?

 

Ravi

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Scala is the native language of Spark. All else equal, it will be easiest to use Spark in Scala. However, of course, not everyone knows Scala or is using it in other projects.

 

Using it from Java is only slightly less convenient. You will write more code since Java's handling of anonymous classes is quite verbose before Java 8. All of the Scala APIs can be called from Java too, although some look weird when accessed from Java. Most APIs have a Java-friendlier version where necessary to ease this integration.

 

Python is probably the least easy to use since it is not JVM-based. There is a runtime overhead to translating back and forth between Spark and Python. Not all APIs are 'translated' to Python. Still, it works, and is useful if, well, you know Python and want to use it.

View solution in original post

4 REPLIES 4

avatar
Master Collaborator

Scala is the native language of Spark. All else equal, it will be easiest to use Spark in Scala. However, of course, not everyone knows Scala or is using it in other projects.

 

Using it from Java is only slightly less convenient. You will write more code since Java's handling of anonymous classes is quite verbose before Java 8. All of the Scala APIs can be called from Java too, although some look weird when accessed from Java. Most APIs have a Java-friendlier version where necessary to ease this integration.

 

Python is probably the least easy to use since it is not JVM-based. There is a runtime overhead to translating back and forth between Spark and Python. Not all APIs are 'translated' to Python. Still, it works, and is useful if, well, you know Python and want to use it.

avatar
Explorer

Thanks a lot of the reply. Could you please tell me the best material to start leanring scala and spark?

avatar
Master Collaborator

There is a Coursera course on Scala right now -- you can still watch the videos although it started weeks ago: https://www.coursera.org/course/progfun

 

There are a number of examples and tutorials on the web concerning Spark. Really, take your pick after searching Google. Here's a blog post I wrote with a quick example: http://blog.cloudera.com/blog/2014/03/why-apache-spark-is-a-crossover-hit-for-data-scientists/

avatar
Explorer

Thanks a lot...