About euklidas

euklidas · ‎03-03-2025

Writing the output back into the database results in the same error. However, I checked the physical and logical plan of these operations and noticed that Spark does a "relation" operation for the second table that is over 900GB, reading all the columns within it instead of choosing the subset that is in the query. Thus, I translated the whole code into SQL and returned the table in dataframe format perfectly... Perhaps you have an idea why doesn't Spark push down the filtering and column prunning operations?

Online	Offline
Last Visited	‎03-10-2025 07:27 AM

Member Since	‎09-05-2024 05:40 AM
Last Visited	‎03-10-2025 07:27 AM
Posts	2

Cloudera Community

Re: Keep getting "ConnectionRefused" or "OOM" erro...