Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

what is the relation and which one creates other in Spark? RDD lineage DAG,DAG Scheduler,Stages and Task

what is the relation and which one creates other in Spark? RDD lineage DAG,DAG Scheduler,Stages and Task

New Contributor

Hi Friends,

I am confused with the creation of RDD lineage,DAG,DAG Scheduler,Stage and Task.

Please validate my understanding

1) After we submit a job before an action is called...what ever transformation are put in the code before an action is called on RDD ..that RDD will have history of lineage..that is which is the parent RDD and what are transformation has occurred to create this RDD and its dependency..this is called lineage (logical execution plan)

2) When an action is called on RDD,the lineage will be converted into DAG(Physical execution plan).

3)DAG(Physical execution plan) will be submitted to DAG Scheduler which in turn will split the DAG into Stages

4)Each stage will have list of task

5)Each task will run in a executor (One executor will run one task on one partition?)

Also I want to understand where the catalyst optimizer and Tungsten encoder will come into plan?

Is it the responsibility of Catalyst optimizer will convert the RDD lineage into the best optimized execution plan as DAG?

Is it the responsbility of Tungsten encode will convert the Scala code into bytecode?

Please help me to understand the above

1 REPLY 1
Highlighted

Re: what is the relation and which one creates other in Spark? RDD lineage DAG,DAG Scheduler,Stages and Task

Contributor

Hi @bsuren123 .

 

I think its all about JVM.

 

for converting scala code into bytecode.

 

 

 

 

Thanks

HadoopHelp

Don't have an account?
Coming from Hortonworks? Activate your account here