I want to iterate every row of a dataframe without using collect. Here is my current implementation:
val df = spark.read.csv("/tmp/s0v00fc/test_dir")
import scala.collection.mutable.Map
var m1 = Map[Int, Int]()
var m4 = Map[Int, Int]()
var j = 1
def Test(m:Int, n:Int):Unit = {
if (!m1.contains(m)) {
m1 += (m -> j)
m4 += (j -> m)
j += 1
}
if (!m1.contains(n)) {
m1 += (n -> j)
m4 += (j -> n)
j += 1
}
df.foreach { row => Test(row(0).toString.toInt, row(1).toString.toInt) }
This does not give any error but m1 and m4 are still empty. I can get the result I am expecting if I do a df.collect as shown below -
df.collect.foreach { row => Test(row(0).toString.toInt, row(1).toString.toInt) }
How do I execute the custom function "Test" on every row of the dataframe without using collect