Support Questions
Find answers, ask questions, and share your expertise

pyspark dataframe or hiveSql update based on predecessor value?

New Contributor


I have a million in rows that I need to update which looks for the highest count of the predecessor from the same source data and replaces the same value on a unique row.

For example.

Original DF.


sno Object Name shape rating
1 Fruit apple round 1.0
2 Fruit apple round 2.0
3 Fruit apple square 2.5
4 Fruit orange round 1.5

Required Target DF.


sno Object Name shape rating
1 Fruit apple round 1.0
2 Fruit apple round 2.0
3 Fruit apple round 2.5 <-- automatically detect the difference in shape column and update from square to round
4 Fruit orange round 1.5
```

Pls advise, how to achieve it in hive sql or pyspark?

0 REPLIES 0
; ;