Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

dataframe to table/hdfs

dataframe to table/hdfs

New Contributor

Hi All,

 

I have a scenario where I need to write data from dataframe to two internal tables. 

 

I have the code as below and it works.

 

 

Code1:

df.write.mode('overwrite').parquet("/Table1Path")
df  = hc.sql("insert into Table2 select * from Table1")

However, when I change the code as below to avoid disk to disk and recomputation. It fails with either Executor not found (or) container not found (or) block not found.

Code2:

   df.persist(StorageLevel.MEMORY_AND_DISK)
   df.write.mode("Append").saveAsTable(schema.tablename1,format="parquet");
   df.write.mode("Append").saveAsTable(schema.tablename2,format="parquet");

I have seen suggestions online to change number of cores/memory/repartition etc. But, can you please let me know how to make spark fail proof irrespective of volume of data. I have selected MEMORY_AND_DISK specifically so that it does not cause memory issue. 

 

2 REPLIES 2

Re: dataframe to table/hdfs

New Contributor

This worked for less volume. Issue occurs for huge volume

Re: dataframe to table/hdfs

Expert Contributor

It may help to add a stack trace with more detail.  It sounds like your executors may be having errors with memory though, either a Java OOM or Yarn killing because of memory exceeded.  If this is the case you will have to play with the memory settings.