Support Questions

Find answers, ask questions, and share your expertise

Spark create table from multiple jobs vs single job method

avatar
Rising Star

Hi,

 

I have a table with a lot of data,

I want to create a new table based on some column values from this based

 

which method is most efficient and cluster resources friendly

 

Pseudo-Code

 

1. single job

        insert into myNewTable

         select * from myOldTable

         where a=xxx etc.

 

2. two jobs:

    job1. create datafame from select statement      

         select * from myOldTable

         where a=xxx etc. as dataframe

 

    job2 write dataframe as new table

         insert into myNewTable select from dataframe

1 ACCEPTED SOLUTION

avatar
Super Guru
Hi,

I do not think there is any different. Spark lazily executes statements, so you second 2 jobs version will behave the same way as the first single job, in my opinion.

Cheers
Eric

View solution in original post

1 REPLY 1

avatar
Super Guru
Hi,

I do not think there is any different. Spark lazily executes statements, so you second 2 jobs version will behave the same way as the first single job, in my opinion.

Cheers
Eric