Support Questions

wiljan_van_rave · ‎07-04-2017

Is it possible to call a scala function from python. The scala function takes a dataframe and returns a dataframe. If possible, with lazy evaluation. Example:

df = sqlContext.read
    .format("com.databricks.spark.csv")
df2 = scalaFunctionBinding(df)
df2.take(10)

asirna · ‎07-06-2017

Hi Wiljan,

Can you please check if this link helps.

-Aditya

wiljan_van_rave · ‎07-07-2017

It gave some inspiration: this worked for me:

It exposes a stupid function called "add" that adds 1 to the first column of the dataframe

package example

import org.apache.spark.sql.DataFrame;
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object Hello {
  def add(df : DataFrame): DataFrame = {
    val fc = df.columns(0);
    var df2 = df.withColumn( fc,  df.col(fc) + 1 );
    return df2;
  }
}

from pyspark.sql import DataFrame
df2 = DataFrame(sc._jvm.example.Hello.add(df._jdf),sqlContext)

Cloudera Community

Support Questions

Is it possible to call a scala function in python(pyspark)