Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

dataframe to table/hdfs

Highlighted

dataframe to table/hdfs

New Contributor

Hi All,

 

I have a scenario where I need to write data from dataframe to two internal tables. 

 

I have the code as below and it works.

 

 

Code1:

df.write.mode('overwrite').parquet("/Table1Path")
df  = hc.sql("insert into Table2 select * from Table1")

However, when I change the code as below to avoid disk to disk and recomputation. It fails with either Executor not found (or) container not found (or) block not found.

Code2:

   df.persist(StorageLevel.MEMORY_AND_DISK)
   df.write.mode("Append").saveAsTable(schema.tablename1,format="parquet");
   df.write.mode("Append").saveAsTable(schema.tablename2,format="parquet");

I have seen suggestions online to change number of cores/memory/repartition etc. But, can you please let me know how to make spark fail proof irrespective of volume of data. I have selected MEMORY_AND_DISK specifically so that it does not cause memory issue. 

 

2 REPLIES 2
Highlighted

Re: dataframe to table/hdfs

New Contributor

This worked for less volume. Issue occurs for huge volume

Re: dataframe to table/hdfs

Expert Contributor

It may help to add a stack trace with more detail.  It sounds like your executors may be having errors with memory though, either a Java OOM or Yarn killing because of memory exceeded.  If this is the case you will have to play with the memory settings.

Don't have an account?
Coming from Hortonworks? Activate your account here