Support Questions
Find answers, ask questions, and share your expertise

Hive Explain says "Plan not optimized by CBO due to missing statistics" but stats are present. What gives?

Contributor

I'm running an explain on a query and the result includes "Plan not optimized by CBO due to missing statistics. Please check log for more details"

1. All the tables in the query have had compute statistics run on them and the describe formatted output shows that stats are present and up to date. What is missing?

2. Which log file is the message referring to? I looked in Hive the hiveserver2.log and can see the log entries for the explain command here but there's no explanation on what stats it thinks are missing.

12 REPLIES 12

Master Collaborator

Did you run both table level and column level statistics:

analyze table t compute statistics;
analyze table t compute statistics for columns;

The log will be the hive client log (/tmp/<user>/hive.log) in case of hive cli, for hiveserver2 it will likely be /var/log/hive/hiveserver2.log (whatever you configured in Ambari).

Contributor

Yes, I ran both. I executed the explain command from within the Ambari Hive view ( I realize not optimal for the result it produces) so I checked the hiveserver2.log. I don't see any info there regarding why it thinks the stats are missing or incomplete.

@Ryan Templeton is your table partitioned? If I understood right, you must execute analyse command for each partition.

@Ryan Templeton @Deepesh @Guilherme Braccialli @gopal

Table is not partitioned.

create table batting (player_id STRING, year INT, runs INT);

Column stats = None

0: jdbc:hive2://phdns02:10000> analyze table batting compute statistics;

INFO : Session is already open

INFO : Dag name: analyze table batting compute statistics(Stage-0)

INFO :

INFO : Status: Running (Executing on YARN cluster with App id application_1448374612679_0002)

INFO : Map 1: 0/1

INFO : Map 1: 0(+1)/1

INFO : Map 1: 1/1

INFO : Table default.batting stats: [numFiles=1, numRows=95196, totalSize=1653324, rawDataSize=1558128]

No rows affected (6.775 seconds)

0: jdbc:hive2://phdns02:10000> analyze table batting compute statistics for columns ;

INFO : Session is already open

INFO : Dag name: analyze table batting compute stat...columns(Stage-0)

INFO :

INFO : Status: Running (Executing on YARN cluster with App id application_1448374612679_0002)

INFO : Map 1: 0(+1)/1Reducer 2: 0/1

INFO : Map 1: 1/1Reducer 2: 0/1

INFO : Map 1: 1/1Reducer 2: 0(+1)/1

INFO : Map 1: 1/1Reducer 2: 1/1

No rows affected (3.131 seconds)

586-screen-shot-2015-11-25-at-65410-am.png

@Ryan Templeton Launching a support case will be a good idea.

If you are getting message "plan not optimized by CBO" when your table has statistics, please, check your hive log and look if you don't have message below:

"parse.BaseSemanticAnalyzer: Not invoking CBO because the statement has too few joins"

This message in log mean CBO wasn't used just because it wasn't needed, as your query may have few joins. The warning message in explain is not clear, as it says only "plan not optimized by CBO", but it's not an issue if you see the extra message in log file.

@Guilherme Braccialli "CBO because the statement has too few joins" I guess it means that "missing statistics" message is normal.

Mentor

@Guilherme Braccialli can you add more detail to the answer, used this thread as reference and it's unclear what solution is.

@Artem Ervits, done, please check if it's clear now.

New Contributor

Hey guys! i am running into the same problem. When I look for stats for a specific column it showing me all other fileds like min, max etc as blank My column level stats are showing as blank. And when I take the plan on the query involving these tables i am getting the warning as Column stats = None Please help!

Master Collaborator

the answer was given by Guilherme Braccialli .

in your /tmp/<user>/hive.log you would see a message like this

2017-06-28 10:04:43,717 INFO  [main]: parse.BaseSemanticAnalyzer (CalcitePlanner.java:canCBOHandleAst(397)) - Not invoking CBO because the statement has too few joins

; ;