Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark-sql: How to traverse through each row of the Dataframe

Spark-sql: How to traverse through each row of the Dataframe

New Contributor

Hi,

I have a dataframe(df) created from hive table(warehouse directory). On the df I have performed group and count and have created a new dataframe( dfGrp). Now on each row of the dfGrp I wanted to call a method. How do I traverse every row of the dataframe? My code looks like below: val df = sqlContext.read.parquet("/user/hive/warehouse/xxx") val dfGrp = df.groupBy("col4").count().select(col("col4"),col("count").as("BadCount")) I wanted to call the method for each row of dfGrp. DataReport(String, String, Double, Double, Dataframe). This method return type is Dataframe.

case class DataReport(query:String, col4:String, BadCount:Double, df:DataFrame)

How I call the method for every row of the dataframe or RDD.

1 REPLY 1
Highlighted

Re: Spark-sql: How to traverse through each row of the Dataframe

Rising Star

I can't understand exactly what you want to do with the DataReport case class... Anyway, if you want to perform a method on each row of a dataframe you have two options:

  1. create a udf, with sqlContext.udf.regster("udfName", /* your scala function */ )
  2. do dfGrp.rdd.map( row => /* your scala function*/ )
Don't have an account?
Coming from Hortonworks? Activate your account here