I have a scenario where I need to write data from dataframe to two internal tables.
I have the code as below and it works.
df.write.mode('overwrite').parquet("/Table1Path") df = hc.sql("insert into Table2 select * from Table1")
However, when I change the code as below to avoid disk to disk and recomputation. It fails with either Executor not found (or) container not found (or) block not found.
df.persist(StorageLevel.MEMORY_AND_DISK) df.write.mode("Append").saveAsTable(schema.tablename1,format="parquet"); df.write.mode("Append").saveAsTable(schema.tablename2,format="parquet");
I have seen suggestions online to change number of cores/memory/repartition etc. But, can you please let me know how to make spark fail proof irrespective of volume of data. I have selected MEMORY_AND_DISK specifically so that it does not cause memory issue.
It may help to add a stack trace with more detail. It sounds like your executors may be having errors with memory though, either a Java OOM or Yarn killing because of memory exceeded. If this is the case you will have to play with the memory settings.