Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Iterate every row of a spark dataframe without using collect
Labels:
- Labels:
-
Apache Spark
New Contributor
Created on 02-28-2019 08:27 PM - edited 09-16-2022 07:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to iterate every row of a dataframe without using collect. Here is my current implementation:
val df = spark.read.csv("/tmp/s0v00fc/test_dir") import scala.collection.mutable.Map var m1 = Map[Int, Int]() var m4 = Map[Int, Int]() var j = 1 def Test(m:Int, n:Int):Unit = { if (!m1.contains(m)) { m1 += (m -> j) m4 += (j -> m) j += 1 } if (!m1.contains(n)) { m1 += (n -> j) m4 += (j -> n) j += 1 } df.foreach { row => Test(row(0).toString.toInt, row(1).toString.toInt) }
This does not give any error but m1 and m4 are still empty. I can get the result I am expecting if I do a df.collect as shown below -
df.collect.foreach { row => Test(row(0).toString.toInt, row(1).toString.toInt) }
How do I execute the custom function "Test" on every row of the dataframe without using collect
0 REPLIES 0
