Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Explain catalyst query optimizer in Apache Spark

Explain catalyst query optimizer in Apache Spark

New Contributor

What do you understand by catalyst query optimizer in Apache Spark?

1 REPLY 1
Highlighted

Re: Explain catalyst query optimizer in Apache Spark

@Dukool SHarma

When working with dataframe api spark is aware of the data structure. Hence it made sense to implement a query optimizer to build the most efficient query plan considering the underlying data structure and transformations applied.

In Spark this optimization is done by Catalyst optimizer. Catalyst optimizer works on query plan in different phases. Analysis, logical plan, physical plan and code generation. The result of it is a DAG of RDD.

If you are interest in reading more about it you should go over the following link:

https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

Don't have an account?
Coming from Hortonworks? Activate your account here