Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

PySpark Duplicating row

Highlighted

PySpark Duplicating row

New Contributor

Hi, I am pretty new to pyspark and I am using csv data file for the code.

My question is how can I get from this

+---+-----+
| id|value|
+---+-----+
|  1|   65|
|  2|   66|
|  3|   65|
|  4|   68|
|  5|   71|
+---+-----+

 to this

+---+-----+----------+
| id|value|prev_value|
+---+-----+----------+
|  1|   65|      null|
|  2|   66|        65|
|  3|   65|        66|
|  4|   68|        65|
|  5|   71|        68|
+---+-----+----------+
Don't have an account?
Coming from Hortonworks? Activate your account here