After some information on how I can use nifi to get a file on S3 send it to pyspark, transform it and move it to another folder in a different bucket.
I have used this template https://gist.github.com/ijokarumawak/26ff675039e252d177b1195f3576cf9a to get data moving between buckets, which works fine.
But im a bit unsure of the next steps of how to pass a file to pyspark, run a script to transform it then put it in another location. I have been looking at this https://pierrevillard.com/2016/03/09/transform-data-with-apache-nifi/ which I will try to understand.
If you know of or have any examples of how I might do this, or could describe how I might set it up