Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark create table from multiple jobs vs single job method

avatar
Contributor

Hi,

 

I have a table with a lot of data,

I want to create a new table based on some column values from this based

 

which method is most efficient and cluster resources friendly

 

Pseudo-Code

 

1. single job

        insert into myNewTable

         select * from myOldTable

         where a=xxx etc.

 

2. two jobs:

    job1. create datafame from select statement      

         select * from myOldTable

         where a=xxx etc. as dataframe

 

    job2 write dataframe as new table

         insert into myNewTable select from dataframe

1 ACCEPTED SOLUTION

avatar
Super Guru
Hi,

I do not think there is any different. Spark lazily executes statements, so you second 2 jobs version will behave the same way as the first single job, in my opinion.

Cheers
Eric

View solution in original post

1 REPLY 1

avatar
Super Guru
Hi,

I do not think there is any different. Spark lazily executes statements, so you second 2 jobs version will behave the same way as the first single job, in my opinion.

Cheers
Eric