Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive Explain says "Plan not optimized by CBO due to missing statistics" but stats are present. What gives?

avatar
Rising Star

I'm running an explain on a query and the result includes "Plan not optimized by CBO due to missing statistics. Please check log for more details"

1. All the tables in the query have had compute statistics run on them and the describe formatted output shows that stats are present and up to date. What is missing?

2. Which log file is the message referring to? I looked in Hive the hiveserver2.log and can see the log entries for the explain command here but there's no explanation on what stats it thinks are missing.

12 REPLIES 12

avatar

Did you run both table level and column level statistics:

analyze table t compute statistics;
analyze table t compute statistics for columns;

The log will be the hive client log (/tmp/<user>/hive.log) in case of hive cli, for hiveserver2 it will likely be /var/log/hive/hiveserver2.log (whatever you configured in Ambari).

avatar
Rising Star

Yes, I ran both. I executed the explain command from within the Ambari Hive view ( I realize not optimal for the result it produces) so I checked the hiveserver2.log. I don't see any info there regarding why it thinks the stats are missing or incomplete.

avatar

@Ryan Templeton is your table partitioned? If I understood right, you must execute analyse command for each partition.

avatar
Master Mentor

@Ryan Templeton @Deepesh @Guilherme Braccialli @gopal

Table is not partitioned.

create table batting (player_id STRING, year INT, runs INT);

Column stats = None

0: jdbc:hive2://phdns02:10000> analyze table batting compute statistics;

INFO : Session is already open

INFO : Dag name: analyze table batting compute statistics(Stage-0)

INFO :

INFO : Status: Running (Executing on YARN cluster with App id application_1448374612679_0002)

INFO : Map 1: 0/1

INFO : Map 1: 0(+1)/1

INFO : Map 1: 1/1

INFO : Table default.batting stats: [numFiles=1, numRows=95196, totalSize=1653324, rawDataSize=1558128]

No rows affected (6.775 seconds)

0: jdbc:hive2://phdns02:10000> analyze table batting compute statistics for columns ;

INFO : Session is already open

INFO : Dag name: analyze table batting compute stat...columns(Stage-0)

INFO :

INFO : Status: Running (Executing on YARN cluster with App id application_1448374612679_0002)

INFO : Map 1: 0(+1)/1Reducer 2: 0/1

INFO : Map 1: 1/1Reducer 2: 0/1

INFO : Map 1: 1/1Reducer 2: 0(+1)/1

INFO : Map 1: 1/1Reducer 2: 1/1

No rows affected (3.131 seconds)

586-screen-shot-2015-11-25-at-65410-am.png

avatar
Master Mentor

avatar
Master Mentor

@Ryan Templeton Launching a support case will be a good idea.

avatar

If you are getting message "plan not optimized by CBO" when your table has statistics, please, check your hive log and look if you don't have message below:

"parse.BaseSemanticAnalyzer: Not invoking CBO because the statement has too few joins"

This message in log mean CBO wasn't used just because it wasn't needed, as your query may have few joins. The warning message in explain is not clear, as it says only "plan not optimized by CBO", but it's not an issue if you see the extra message in log file.

avatar
Master Mentor

@Guilherme Braccialli "CBO because the statement has too few joins" I guess it means that "missing statistics" message is normal.

avatar
Master Mentor

@Guilherme Braccialli can you add more detail to the answer, used this thread as reference and it's unclear what solution is.