Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Iterate a dataframe

avatar
Expert Contributor

Hello, Please I will like to iterate and perform calculations accumulated in a column of my dataframe but I can not. Can you help me? Thank you

Here the creation of my dataframe. I would like to calculate an accumulated blglast the column and stored in a new column

from pyspark.sql import HiveContext from pyspark import SparkContext from pandas import DataFrame as df sc =SparkContext() hive_context = HiveContext(sc) tab = hive_context.table("table") tab.registerTempTable("tab_temp") df=hive_context.sql("SELECT blglast FROM tab_temp AS b limit 50")

df.show()

1 ACCEPTED SOLUTION

avatar
iterate = df.map(lambda p: "Name: " + p.blglast )
for iteration in iterate.collect():
  print(iteration)

@alain TSAFACK

Please refer : http://spark.apache.org/docs/latest/sql-programming-guide.html for more information

View solution in original post

3 REPLIES 3

avatar
Super Guru

after df=hive_context.sql("SELECT blglast FROM tab_temp AS b limit 50"), you can get the row object in which you can perform your custom logic.

>>> for row in df.rdd.collect():

... dosomething ...

avatar
iterate = df.map(lambda p: "Name: " + p.blglast )
for iteration in iterate.collect():
  print(iteration)

@alain TSAFACK

Please refer : http://spark.apache.org/docs/latest/sql-programming-guide.html for more information

avatar
Expert Contributor

Hello, Thank you for the directive. But I 'm new to the dataframe and what I try to do is be able to make it to retrieve the values of the indices i and i + 1 for example. Best regards