There exists any tutorial that shows a pratical case in how we can do some data cleansing with Spark (specially with Python)?
There are several Spark tutorials that use the Sandbox available on the Hortonworks website. You may be interested in the Interacting with Data on HDP Using Apache Zeppelin and Apache Spark tutorial. We also offer online training via Hortonworks University focused on Data Science and Spark.
View solution in original post