Support Questions

Find answers, ask questions, and share your expertise

Is it possible to call a scala function in python(pyspark)

avatar

Is it possible to call a scala function from python. The scala function takes a dataframe and returns a dataframe. If possible, with lazy evaluation. Example:

df = sqlContext.read
    .format("com.databricks.spark.csv")
df2 = scalaFunctionBinding(df)
df2.take(10)
2 REPLIES 2

avatar
Super Guru

Hi Wiljan,

Can you please check if this link helps.

-Aditya

avatar

It gave some inspiration: this worked for me:

It exposes a stupid function called "add" that adds 1 to the first column of the dataframe

package example

import org.apache.spark.sql.DataFrame;
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object Hello {
  def add(df : DataFrame): DataFrame = {
    val fc = df.columns(0);
    var df2 = df.withColumn( fc,  df.col(fc) + 1 );
    return df2;
  }
}
from pyspark.sql import DataFrame
df2 = DataFrame(sc._jvm.example.Hello.add(df._jdf),sqlContext)