About SudhaGanesh

SudhaGanesh · ‎02-28-2019

I want to iterate every row of a dataframe without using collect. Here is my current implementation: val df = spark.read.csv("/tmp/s0v00fc/test_dir") import scala.collection.mutable.Map var m1 = Map[Int, Int]() var m4 = Map[Int, Int]() var j = 1 def Test(m:Int, n:Int):Unit = { if (!m1.contains(m)) { m1 += (m -> j) m4 += (j -> m) j += 1 } if (!m1.contains(n)) { m1 += (n -> j) m4 += (j -> n) j += 1 } df.foreach { row => Test(row(0).toString.toInt, row(1).toString.toInt) } This does not give any error but m1 and m4 are still empty. I can get the result I am expecting if I do a df.collect as shown below - df.collect.foreach { row => Test(row(0).toString.toInt, row(1).toString.toInt) } How do I execute the custom function "Test" on every row of the dataframe without using collect

Online	Offline
Last Visited	‎03-04-2019 11:11 AM

Member Since	‎02-28-2019 08:23 PM
Last Visited	‎03-04-2019 11:11 AM
Posts	1

Cloudera Community

Iterate every row of a spark dataframe without usi...