Hello!
Currently, I am working on a project using pySpark that reads in a few Hive tables, stores them as dataframes, and I have to perform a few updates/filters on them. I am avoiding using Spark syntax at all costs to create a framework that will only take SQL in a parameter file that will be run using my pySpark framework.
Now the problem is that I have to perform UPDATE/DELETE queries on my final dataframe, are there any possible work arounds to performing these operations on my dataframe?
Thank you so much!