Created 11-24-2015 11:10 PM
I'm running an explain on a query and the result includes "Plan not optimized by CBO due to missing statistics. Please check log for more details"
1. All the tables in the query have had compute statistics run on them and the describe formatted output shows that stats are present and up to date. What is missing?
2. Which log file is the message referring to? I looked in Hive the hiveserver2.log and can see the log entries for the explain command here but there's no explanation on what stats it thinks are missing.
Created 11-24-2015 11:24 PM
Did you run both table level and column level statistics:
analyze table t compute statistics; analyze table t compute statistics for columns;
The log will be the hive client log (/tmp/<user>/hive.log) in case of hive cli, for hiveserver2 it will likely be /var/log/hive/hiveserver2.log (whatever you configured in Ambari).
Created 11-25-2015 12:00 AM
Yes, I ran both. I executed the explain command from within the Ambari Hive view ( I realize not optimal for the result it produces) so I checked the hiveserver2.log. I don't see any info there regarding why it thinks the stats are missing or incomplete.
Created 11-25-2015 11:52 AM
@Ryan Templeton is your table partitioned? If I understood right, you must execute analyse command for each partition.
Created on 11-25-2015 11:54 AM - edited 08-19-2019 05:45 AM
@Ryan Templeton @Deepesh @Guilherme Braccialli @gopal
Table is not partitioned.
create table batting (player_id STRING, year INT, runs INT);
Column stats = None
0: jdbc:hive2://phdns02:10000> analyze table batting compute statistics;
INFO : Session is already open
INFO : Dag name: analyze table batting compute statistics(Stage-0)
INFO :
INFO : Status: Running (Executing on YARN cluster with App id application_1448374612679_0002)
INFO : Map 1: 0/1
INFO : Map 1: 0(+1)/1
INFO : Map 1: 1/1
INFO : Table default.batting stats: [numFiles=1, numRows=95196, totalSize=1653324, rawDataSize=1558128]
No rows affected (6.775 seconds)
0: jdbc:hive2://phdns02:10000> analyze table batting compute statistics for columns ;
INFO : Session is already open
INFO : Dag name: analyze table batting compute stat...columns(Stage-0)
INFO :
INFO : Status: Running (Executing on YARN cluster with App id application_1448374612679_0002)
INFO : Map 1: 0(+1)/1Reducer 2: 0/1
INFO : Map 1: 1/1Reducer 2: 0/1
INFO : Map 1: 1/1Reducer 2: 0(+1)/1
INFO : Map 1: 1/1Reducer 2: 1/1
No rows affected (3.131 seconds)
Created 11-25-2015 07:38 PM
Created 11-25-2015 08:31 PM
@Ryan Templeton Launching a support case will be a good idea.
Created 02-20-2016 01:49 PM
If you are getting message "plan not optimized by CBO" when your table has statistics, please, check your hive log and look if you don't have message below:
"parse.BaseSemanticAnalyzer: Not invoking CBO because the statement has too few joins"
This message in log mean CBO wasn't used just because it wasn't needed, as your query may have few joins. The warning message in explain is not clear, as it says only "plan not optimized by CBO", but it's not an issue if you see the extra message in log file.
Created 02-20-2016 01:57 PM
@Guilherme Braccialli "CBO because the statement has too few joins" I guess it means that "missing statistics" message is normal.
Created 03-15-2016 01:54 AM
@Guilherme Braccialli can you add more detail to the answer, used this thread as reference and it's unclear what solution is.