Created 06-14-2016 08:43 AM
Hello, Please I will like to iterate and perform calculations accumulated in a column of my dataframe but I can not. Can you help me? Thank you
Here the creation of my dataframe. I would like to calculate an accumulated blglast the column and stored in a new column
from pyspark.sql import HiveContext from pyspark import SparkContext from pandas import DataFrame as df sc =SparkContext() hive_context = HiveContext(sc) tab = hive_context.table("table") tab.registerTempTable("tab_temp") df=hive_context.sql("SELECT blglast FROM tab_temp AS b limit 50")
df.show()
Created 06-14-2016 11:11 AM
iterate = df.map(lambda p: "Name: " + p.blglast ) for iteration in iterate.collect(): print(iteration)
Please refer : http://spark.apache.org/docs/latest/sql-programming-guide.html for more information
Created 06-14-2016 09:10 AM
after df=hive_context.sql("SELECT blglast FROM tab_temp AS b limit 50"), you can get the row object in which you can perform your custom logic.
>>> for row in df.rdd.collect():
... dosomething ...
Created 06-14-2016 11:11 AM
iterate = df.map(lambda p: "Name: " + p.blglast ) for iteration in iterate.collect(): print(iteration)
Please refer : http://spark.apache.org/docs/latest/sql-programming-guide.html for more information
Created 06-21-2016 08:30 AM
Hello, Thank you for the directive. But I 'm new to the dataframe and what I try to do is be able to make it to retrieve the values of the indices i and i + 1 for example. Best regards