Member since
05-17-2017
3
Posts
0
Kudos Received
0
Solutions
05-18-2017
04:51 PM
Ah, I see. Thank you @Deepesh. I was able to confirm viewing the statistics of interest when specifying a specific partition. I have a follow up question based on that. Is there a single command to view the stats summed for all partitions or would I need to execute the above for each partition in the table and total the values myself?
... View more
05-17-2017
04:55 AM
Invalid syntax:
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseException line 1:35 cannot recognize input near ''2016-12-30'' ')' 'COMPUTE' in analyze statement<br>
According to the Hive documentation the partition column and name need to be specified if you are analyzing a particular partition. Specifying the column name only can be used to analyze all partitions.https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables–ANALYZE
... View more
05-17-2017
02:48 AM
I am attempting to perform an ANALYZE on a partitioned table to generate statistics for numRows and totalSize. For a non-partitioned table I get the results I am looking for but for a dynamic partitioned table it does not provide the information I am seeking. As a newbie to Hive, I assume I am doing something wrong.
We are running Hive 1.2.1.2.5.
Test Case 1: CREATE TABLE `test_table` (`date` DATE,`name` VARCHAR(255));
INSERT INTO `test_table` VALUES
('2016-12-30','john'),('2016-12-30','sarah'),
('2016-12-31','bill'),
('2017-01-01','mary'),('2017-01-01','pete');
ANALYZE TABLE `test_table` COMPUTE STATISTICS;
SHOW tblproperties `test_table`;
The results are as I would expect: prpt_name prpt_value COLUMN_STATS_ACCURATE {"BASIC_STATS":"true"} numFiles 1 numRows 5 rawDataSize 76 totalSize 81 transient_lastDdlTime 1494980403 Test Case 2: CREATE TABLE `test_table` (`name` VARCHAR(255)) PARTITIONED BY (`date` DATE);
INSERT INTO `test_table` PARTITION (`date`='2016-12-30')
VALUES ('john'),('sarah');
INSERT INTO `test_table` PARTITION (`date`='2016-12-31')
VALUES ('bill');
INSERT INTO `test_table` PARTITION (`date`='2017-01-01')
VALUES ('mary'),('pete');
ANALYZE TABLE `test_table` PARTITION(`date`) COMPUTE STATISTICS;
SHOW tblproperties `test_table`;
However, I do not see any of the statistics I would expect to see: prpt_name prpt_value transient_lastDdlTime 1494978354 Is there something I am doing wrong with the table creation or with the analyze statement? EDIT: Per suggestion, I did test using the following analyze statement as well with the same results: ANALYZE TABLE `test_table`
PARTITION(`date`='2016-12-30',`date`='2016-12-31',`date`='2017-01-01')
COMPUTE STATISTICS;
... View more
Labels:
- Labels:
-
Apache Hive