Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Disable Hive Metastore Statistics

Highlighted

Disable Hive Metastore Statistics

Contributor

Hi All,

We currently have a Hive Dynamic Partitioning INSERT INTO job which runs very quickly (minutes) in the YARN/MR stage but takes a VERY long time (hours) when its loading the partitions. After reviewing the logs I believe this is happening at the Hive Metastore level where its attempting to calculate the statistics for each partition involved in the load.

Can anyone please tell us how to disable the stats generation part of the insert process with Hive?

We have attempted to use the following properties and they do not work prevent the Metastore from attempting to generate stats as its not an INSERT OVERWRITE query but an INSERT INTO (append.) We validated that the jobs properties where being taken correct by searching the configuration provided for the app in the YARN UI.

#Not Working Properties

set hive.stats.autogather=false;

set hive.stats.collect.rawdatasize=false;

set hive.analyze.stmt.collect.partlevel.stats=false;

If I am understanding this correctly its this Jira that is creating my problem as it does not provide a way to go back to the default behaviour. https://issues.apache.org/jira/browse/HIVE-3959

PS - each partition will have gigs to terabytes of data... We are just trying to run this right now on a smaller amount and its creating issues...

5 REPLIES 5

Re: Disable Hive Metastore Statistics

New Contributor

Joe are you sure it's not the move task? That can take a really long time if you have thousands of partitions.

Re: Disable Hive Metastore Statistics

Contributor

Hi Carter,

It could be the move but when I watch the logs I don't see it talking about the movement its talking about the calculation of the file stats. But I agree 1000's of inode updates could cause this behavoir. We have moved away from partitioning by this even though it fits our access pattern exactly and moved to using partitions by date, and buckets on the identifiers.

Re: Disable Hive Metastore Statistics

Cloudera Employee

@Joseph NiemiecDo you have this conf "fs.hdfs.impl.disable.cache" as false in Hive ? if not it may run lot of distcp jobs, it was a bug in earlier releases.

Re: Disable Hive Metastore Statistics

Contributor

set fs.hdfs.impl.disable.cache;

Getting log thread is interrupted, since query is done!

+-----------------------------------+--+ | set | +-----------------------------------+--+ | fs.hdfs.impl.disable.cache=false |

Re: Disable Hive Metastore Statistics

Mentor

@Joseph Niemiec are you still having issues with this? Can you accept best answer or provide your own solution?