Member since
02-28-2019
1
Post
0
Kudos Received
0
Solutions
02-28-2019
08:27 PM
I want to iterate every row of a dataframe without using collect. Here is my current implementation: val df = spark.read.csv("/tmp/s0v00fc/test_dir")
import scala.collection.mutable.Map
var m1 = Map[Int, Int]()
var m4 = Map[Int, Int]()
var j = 1
def Test(m:Int, n:Int):Unit = {
if (!m1.contains(m)) {
m1 += (m -> j)
m4 += (j -> m)
j += 1
}
if (!m1.contains(n)) {
m1 += (n -> j)
m4 += (j -> n)
j += 1
}
df.foreach { row => Test(row(0).toString.toInt, row(1).toString.toInt) } This does not give any error but m1 and m4 are still empty. I can get the result I am expecting if I do a df.collect as shown below - df.collect.foreach { row => Test(row(0).toString.toInt, row(1).toString.toInt) } How do I execute the custom function "Test" on every row of the dataframe without using collect
... View more
Labels:
- Labels:
-
Apache Spark