I have a million in rows that I need to update which looks for the highest count of the predecessor from the same source data and replaces the same value on a unique row.
For example.
Original DF.
sno Object Name shape rating
1 Fruit apple round 1.0
2 Fruit apple round 2.0
3 Fruit apple square 2.5
4 Fruit orange round 1.5
Required Target DF.
sno Object Name shape rating
1 Fruit apple round 1.0
2 Fruit apple round 2.0
3 Fruit apple round 2.5 <-- automatically detect the difference in shape column and update from square to round
4 Fruit orange round 1.5
```
Pls advise, how to achieve it in hive sql or pyspark?