Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Iterate a dataframe

avatar
Expert Contributor

Hello, Please I will like to iterate and perform calculations accumulated in a column of my dataframe but I can not. Can you help me? Thank you

Here the creation of my dataframe. I would like to calculate an accumulated blglast the column and stored in a new column

from pyspark.sql import HiveContext from pyspark import SparkContext from pandas import DataFrame as df sc =SparkContext() hive_context = HiveContext(sc) tab = hive_context.table("table") tab.registerTempTable("tab_temp") df=hive_context.sql("SELECT blglast FROM tab_temp AS b limit 50")

df.show()

1 ACCEPTED SOLUTION

avatar
iterate = df.map(lambda p: "Name: " + p.blglast )
for iteration in iterate.collect():
  print(iteration)

@alain TSAFACK

Please refer : http://spark.apache.org/docs/latest/sql-programming-guide.html for more information

View solution in original post

3 REPLIES 3

avatar
Super Guru

after df=hive_context.sql("SELECT blglast FROM tab_temp AS b limit 50"), you can get the row object in which you can perform your custom logic.

>>> for row in df.rdd.collect():

... dosomething ...

avatar
iterate = df.map(lambda p: "Name: " + p.blglast )
for iteration in iterate.collect():
  print(iteration)

@alain TSAFACK

Please refer : http://spark.apache.org/docs/latest/sql-programming-guide.html for more information

avatar
Expert Contributor

Hello, Thank you for the directive. But I 'm new to the dataframe and what I try to do is be able to make it to retrieve the values of the indices i and i + 1 for example. Best regards