Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Is it possible to call a scala function in python(pyspark)
Labels:
- Labels:
-
Apache Spark
Explorer
Created on ‎07-04-2017 07:48 AM - edited ‎09-16-2022 04:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is it possible to call a scala function from python. The scala function takes a dataframe and returns a dataframe. If possible, with lazy evaluation. Example:
df = sqlContext.read .format("com.databricks.spark.csv") df2 = scalaFunctionBinding(df) df2.take(10)
2 REPLIES 2
Super Guru
Created ‎07-06-2017 08:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Explorer
Created ‎07-07-2017 07:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It gave some inspiration: this worked for me:
It exposes a stupid function called "add" that adds 1 to the first column of the dataframe
package example import org.apache.spark.sql.DataFrame; import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object Hello { def add(df : DataFrame): DataFrame = { val fc = df.columns(0); var df2 = df.withColumn( fc, df.col(fc) + 1 ); return df2; } }
from pyspark.sql import DataFrame df2 = DataFrame(sc._jvm.example.Hello.add(df._jdf),sqlContext)
