Created on 07-04-2017 07:48 AM - edited 09-16-2022 04:53 AM
Is it possible to call a scala function from python. The scala function takes a dataframe and returns a dataframe. If possible, with lazy evaluation. Example:
df = sqlContext.read .format("com.databricks.spark.csv") df2 = scalaFunctionBinding(df) df2.take(10)
Created 07-06-2017 08:54 AM
Created 07-07-2017 07:20 AM
It gave some inspiration: this worked for me:
It exposes a stupid function called "add" that adds 1 to the first column of the dataframe
package example import org.apache.spark.sql.DataFrame; import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object Hello { def add(df : DataFrame): DataFrame = { val fc = df.columns(0); var df2 = df.withColumn( fc, df.col(fc) + 1 ); return df2; } }
from pyspark.sql import DataFrame df2 = DataFrame(sc._jvm.example.Hello.add(df._jdf),sqlContext)