Hi,
 
I have a table with a lot of data,
I want to create a new table based on some column values from this based
 
which method is most efficient and cluster resources friendly
 
Pseudo-Code
 
1. single job
        insert into myNewTable
         select * from myOldTable
         where a=xxx etc.
 
2. two jobs:
    job1. create datafame from select statement      
         select * from myOldTable
         where a=xxx etc. as dataframe
 
    job2 write dataframe as new table
         insert into myNewTable select from dataframe