Member since
07-08-2016
8
Posts
0
Kudos Received
0
Solutions
10-25-2018
07:51 PM
Useful for getting SparkMagic to run w/ Jupyter. And the images do not seem to load for me either, still good how-to tech article for Jupyter.
... View more
07-02-2018
02:06 AM
Thanks another nice tutorial for Spark, this one with Scala, good performance also. Well written and good for learning.
... View more
07-02-2018
02:04 AM
The Spark Workshop is a pretty nice idea to get up and going quick.
... View more
09-04-2017
04:49 PM
This worked for me, thanks.
... View more
03-15-2017
05:50 PM
For the 1st questions only, using Spark SQL to cast all the columns, perhaps something like: DFtable.select([col(i).cast("long") for i in DFtable.columns])
... View more
03-15-2017
04:37 PM
Agree that Python is likely the easiest, and that with a Java background could pick up Scala quickly. Having a Java background, should be straightforward with more verbose coding to use the Java API. To get the basic concepts down of data-parallelism, Python seemed really fast to implement the ideas in Spark, although for performance issues, believe RDDs are slower in Python than Scala or Java, and Dataframes are only slightly slower in Python than the other two programming options.
... View more
03-15-2017
04:32 PM
Interesting, assumed that something like the following: sc.parallelize([1, 2, 3, 4]).foreach(lambda x: accum.add(x)) As per the Spark manual would execute in parallel in Spark, applying a function to each element. But perhaps that is not quite what the above line is doing.
... View more