Cloudera Community

Support Questions

Find answers, ask questions, and share your expertise

Advanced Search

SudhaGanesh

New Contributor

I want to iterate every row of a dataframe without using collect. Here is my current implementation:

val df = spark.read.csv("/tmp/s0v00fc/test_dir")

import scala.collection.mutable.Map
var m1 = Map[Int, Int]()
var m4 = Map[Int, Int]()
var j = 1

def Test(m:Int, n:Int):Unit = {
  if (!m1.contains(m)) {
    m1 += (m -> j)
    m4 += (j -> m)
    j += 1
  }
  if (!m1.contains(n)) {
    m1 += (n -> j)
    m4 += (j -> n)
    j += 1
  }

 df.foreach { row => Test(row(0).toString.toInt, row(1).toString.toInt) }

This does not give any error but m1 and m4 are still empty. I can get the result I am expecting if I do a df.collect as shown below -

df.collect.foreach { row => Test(row(0).toString.toInt, row(1).toString.toInt) }

How do I execute the custom function "Test" on every row of the dataframe without using collect

19,249 Views

0 REPLIES 0

Announcements

Community Announcements

April 2025 Cloudera Customer Advisory: Cloudera’s response t...

What's New @ Cloudera

[RELEASED] Cloudera Streaming Analytics - Kubernetes Operato...

What's New @ Cloudera

[RELEASED] Cloudera Streams Messaging - Kubernetes Operator ...

Community Announcements

February 2025 Community Highlights

What's New @ Cloudera

3 Benefits of External IDE Connectivity, Now Available in Cl...

Cloudera Community

Support Questions

Iterate every row of a spark dataframe without using collect

Using Spark 1.6.x dataframes to access SAP HANA ...

Iterate over ADLS files using spark?

How to iterate multiple HDFS files in Spark-Scala ...

Save Spark DataFrame table into Phoenix

Spark in CML: Recommendations for using Spark in C...

Garbage Collection Pauses in Namenode and Datanode

Spark DataFrame to Solr Cloud - runs on Sandbox 2....

Column names not getting created for Spark DataFra...

How to replace blank rows in pyspark Dataframe?

Using Row/Column level security of Spark with Zepp...