Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: The Cloudera Community will undergo maintenance on Saturday, August 17 at 12:00am PDT. See more info here.

Spark create table from multiple jobs vs single job method

SOLVED Go to solution
Highlighted

Spark create table from multiple jobs vs single job method

Explorer

Hi,

 

I have a table with a lot of data,

I want to create a new table based on some column values from this based

 

which method is most efficient and cluster resources friendly

 

Pseudo-Code

 

1. single job

        insert into myNewTable

         select * from myOldTable

         where a=xxx etc.

 

2. two jobs:

    job1. create datafame from select statement      

         select * from myOldTable

         where a=xxx etc. as dataframe

 

    job2 write dataframe as new table

         insert into myNewTable select from dataframe

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Spark create table from multiple jobs vs single job method

Guru
Hi,

I do not think there is any different. Spark lazily executes statements, so you second 2 jobs version will behave the same way as the first single job, in my opinion.

Cheers
Eric
1 REPLY 1

Re: Spark create table from multiple jobs vs single job method

Guru
Hi,

I do not think there is any different. Spark lazily executes statements, so you second 2 jobs version will behave the same way as the first single job, in my opinion.

Cheers
Eric