Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Does pig use cost based optimizer?

Does pig use cost based optimizer?

Super Guru

Does pig use cost based optimizer similar to Hive?

3 REPLIES 3

Re: Does pig use cost based optimizer?

Guru

No. CBO is only part of Hive, not Pig.

Re: Does pig use cost based optimizer?

As Ravi said, they don't use the cost based optimizer similar to hive ( which would need statistics ) but they have their own optimizer that uses a lot of rules to optimize execution graphs.

https://pig.apache.org/docs/r0.9.1/perf.html#optimization-rules

In the end since its a data flow language you have much more control over the join order yourself and can freely select the join types. I think they only use very basic Join optimization.

https://pig.apache.org/docs/r0.9.1/perf.html#specialized-joins

Re: Does pig use cost based optimizer?

New Contributor

Apache Calcite is the query-optimization framework used by Hive's cost-based optimizer, and we just added a Pig adapter. This means that you can enter a SQL query, Calcite will optimize it, and the query is executed using Pig's execution engine.

Don't have an account?
Coming from Hortonworks? Activate your account here