When working with dataframe api spark is aware of the data structure. Hence it made sense to implement a query optimizer to build the most efficient query plan considering the underlying data structure and transformations applied.
In Spark this optimization is done by Catalyst optimizer. Catalyst optimizer works on query plan in different phases. Analysis, logical plan, physical plan and code generation. The result of it is a DAG of RDD.
If you are interest in reading more about it you should go over the following link: