Hi Friends,
I am confused with the creation of RDD lineage,DAG,DAG Scheduler,Stage and Task.
Please validate my understanding
1) After we submit a job before an action is called...what ever transformation are put in the code before an action is called on RDD ..that RDD will have history of lineage..that is which is the parent RDD and what are transformation has occurred to create this RDD and its dependency..this is called lineage (logical execution plan)
2) When an action is called on RDD,the lineage will be converted into DAG(Physical execution plan).
3)DAG(Physical execution plan) will be submitted to DAG Scheduler which in turn will split the DAG into Stages
4)Each stage will have list of task
5)Each task will run in a executor (One executor will run one task on one partition?)
Also I want to understand where the catalyst optimizer and Tungsten encoder will come into plan?
Is it the responsibility of Catalyst optimizer will convert the RDD lineage into the best optimized execution plan as DAG?
Is it the responsbility of Tungsten encode will convert the Scala code into bytecode?
Please help me to understand the above