Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Configuring Hive to Store Statistics in MySQL

Highlighted

Configuring Hive to Store Statistics in MySQL

New Contributor

i am trying to publish hive stats into hive metastore (mysql). i have configured below properties. i have created new metastore HOST and pointing to tables in another metastore.
<property>
  <name>hive.stats.dbclass</name>
  <value>jdbc:mysql</value>
</property>
<property>
  <name>hive.stats.jdbcdriver</name>
  <value>com.mysql.jdbc.Driver</value>
</property>
<property>
  <name>hive.stats.dbconnectionstring</name>
  <value>jdbc:mysql://localhost/stats_db_name?createDatabaseIfNotExist=true&amp;user=hive&amp;password=passwd</value>
</property>
<property>
  <name>hive.aux.jars.path</name>
  <value>file:///usr/share/java/mysql-connector-java.jar</value>
</property>

while running analyze query it is throwing below exception.

[Error 30017]: Skipping stats aggregation by error org.apache.hadoop.hive.ql.metadata.HiveException: [Error 30001]: StatsPublisher cannot be initialized. There was a error in the initialization of StatsPublisher, and retrying might help. If you dont want the query to fail because accurate statistics could not be collected, set hive.stats.reliable=false

while trying to debug i have got below error in yarn logs for analyze query applicationId.

2015-09-11 17:36:29,120 ERROR [TezChild] jdbc.JDBCStatsPublisher: Error during publishing statistics.
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'stats_db_name.PARTITION_STATS_V2' doesn't exist
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.Util.getInstance(Util.java:386)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2625)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2119)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2415)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2333)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2318)
at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:150)
at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:147)
at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2910)
at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:163)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.publishStats(TableScanOperator.java:305)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.closeOp(TableScanOperator.java:226)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:309)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:177)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-09-11 17:36:29,121 INFO [TezChild] exec.TableScanOperator: publishing : default.tdcdv1c_stg_ras_act_ord_bill_details_o/000000/ : {numRows=100, rawDataSize=14842}
2015-09-11 17:36:29,122 INFO [TezChild] exec.TableScanOperator: 0 Close done
2015-09-11 17:36:29,122 INFO [TezChild] exec.MapOperator: 1 Close done
2015-09-11 17:36:29,126 INFO [TezChild] task.TezTaskRunner: Closing task, taskAttemptId=attempt_1441004947604_36519_1_00_000000_0
2015-09-11 17:36:29,126 INFO [TezChild] output.MROutput: Flushing Simple Output
2015-09-11 17:36:29,126 INFO [TezChild] output.MROutput: Flushed Simple Output
2015-09-11 17:36:29,222 INFO [TezChild] task.TezTaskRunner: Task completed, taskAttemptId=attempt_1441004947604_36519_1_00_000000_0, fatalErrorOccurred=false
2015-09-11 17:36:29,244 INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: Final Counters : Counters: 14 [[File System Counters HDFS_BYTES_READ=810594, HDFS_BYTES_WRITTEN=0, HDFS_READ_OPS=108, HDFS_LARGE_READ_OPS=0, HDFS_WRITE_OPS=0][org.apache.tez.common.counters.TaskCounter GC_TIME_MILLIS=29, CPU_MILLISECONDS=3360, PHYSICAL_MEMORY_BYTES=297529344, VIRTUAL_MEMORY_BYTES=7876644864, COMMITTED_HEAP_BYTES=2118123520, INPUT_RECORDS_PROCESSED=100, OUTPUT_RECORDS=0][HIVE DESERIALIZE_ERRORS=0, RECORDS_IN_Map_1=100]]
2015-09-11 17:36:29,348 INFO [main] common.TezUtilsInternal: Redirecting log file based on addend:


can you please let me know if i am missing anything in connection string.

2 REPLIES 2
Highlighted

Re: Configuring Hive to Store Statistics in MySQL

Master Guru
The statistics table pre-exists within the Hive metastore DB namespace. You're passing that config an entirely different DB name (stats_db_name) instead, leading to it being unable to find the table schema under it.

Are you wanting to store it under another DB intentionally?

Re: Configuring Hive to Store Statistics in MySQL

Contributor

I'm having the same issue. I want to store in the re-defined MySQL database as I don't want to use stats through Impala.

 

The documentation at Cloudera isn't quite clear for the newer version. I followed for v5.2 but seems that Hive has changed its implementation of Stats since then. Can I use the older implementation still? 

Don't have an account?
Coming from Hortonworks? Activate your account here