Archives of Support Questions (Read Only)

nanyim_alain · ‎06-14-2016

Hello, Please I will like to iterate and perform calculations accumulated in a column of my dataframe but I can not. Can you help me? Thank you

Here the creation of my dataframe. I would like to calculate an accumulated blglast the column and stored in a new column

from pyspark.sql import HiveContext from pyspark import SparkContext from pandas import DataFrame as df sc =SparkContext() hive_context = HiveContext(sc) tab = hive_context.table("table") tab.registerTempTable("tab_temp") df=hive_context.sql("SELECT blglast FROM tab_temp AS b limit 50")

df.show()

sandyy006 · ‎06-14-2016

iterate = df.map(lambda p: "Name: " + p.blglast )
for iteration in iterate.collect():
  print(iteration)

@alain TSAFACK

Please refer : http://spark.apache.org/docs/latest/sql-programming-guide.html for more information

View solution in original post

rajkumar_singh · ‎06-14-2016

after df=hive_context.sql("SELECT blglast FROM tab_temp AS b limit 50"), you can get the row object in which you can perform your custom logic.

>>> for row in df.rdd.collect():

... dosomething ...

sandyy006 · ‎06-14-2016

iterate = df.map(lambda p: "Name: " + p.blglast )
for iteration in iterate.collect():
  print(iteration)

@alain TSAFACK

Please refer : http://spark.apache.org/docs/latest/sql-programming-guide.html for more information

nanyim_alain · ‎06-21-2016

Hello, Thank you for the directive. But I 'm new to the dataframe and what I try to do is be able to make it to retrieve the values of the indices i and i + 1 for example. Best regards

Cloudera Community

Archives of Support Questions (Read Only)

Iterate a dataframe