Member since
09-05-2024
2
Posts
0
Kudos Received
0
Solutions
03-03-2025
01:24 AM
Writing the output back into the database results in the same error. However, I checked the physical and logical plan of these operations and noticed that Spark does a "relation" operation for the second table that is over 900GB, reading all the columns within it instead of choosing the subset that is in the query. Thus, I translated the whole code into SQL and returned the table in dataframe format perfectly... Perhaps you have an idea why doesn't Spark push down the filtering and column prunning operations?
... View more