Spark create table from multiple jobs vs single job method

ChineduLB — Fri, 16 Sep 2022 14:29:29 GMT

Hi,

I have a table with a lot of data,

I want to create a new table based on some column values from this based

which method is most efficient and cluster resources friendly

Pseudo-Code

1. single job

insert into myNewTable

select * from myOldTable

where a=xxx etc.

2. two jobs:

job1. create datafame from select statement

select * from myOldTable

where a=xxx etc. as dataframe

job2 write dataframe as new table

insert into myNewTable select from dataframe

Re: Spark create table from multiple jobs vs single job method

EricL — Mon, 15 Jul 2019 10:12:48 GMT

Hi,

I do not think there is any different. Spark lazily executes statements, so you second 2 jobs version will behave the same way as the first single job, in my opinion.

Cheers
Eric

question Re: Spark create table from multiple jobs vs single job method in Support Questions

Spark create table from multiple jobs vs single job method

Re: Spark create table from multiple jobs vs single job method