Member since
05-10-2016
184
Posts
60
Kudos Received
6
Solutions
08-21-2018
09:27 PM
Environment CentOS 7 Ambari 2.7.0.0 Error: After ambari-server setup, and deploying the desired configuration via Ambari, found this error in the ambari-server.log.
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Courier New'; color: #28fe14; background-color: #000000; background-color: rgba(0, 0, 0, 0.9)}
span.s1 {font-variant-ligatures: no-common-ligatures}
span.s2 {font-variant-ligatures: no-common-ligatures; color: #000000; background-color: rgba(40, 254, 20, 0.9)}
2018-08-21 20:09:05,733 WARN [ambari-client-thread-36] HttpChannel:507 - /api/v1/clusters//requests/3
org.springframework.security.web.firewall.RequestRejectedException: The request was rejected because the URL was not normalized.
at org.springframework.security.web.firewall.StrictHttpFirewall.getFirewalledRequest(StrictHttpFirewall.java:123)
at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:193)
at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:177)
at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:347)
at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:263)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1621)
at org.apache.ambari.server.api.MethodOverrideFilter.doFilter(MethodOverrideFilter.java:73)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1621)
at org.apache.ambari.server.api.AmbariPersistFilter.doFilter(AmbariPersistFilter.java:53)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1621)
at org.apache.ambari.server.security.AbstractSecurityHeaderFilter.doFilter(AbstractSecurityHeaderFilter.java:130)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1621)
at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:51)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1621)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:541)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1592)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1239)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:481)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1561)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1141)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:561)
at org.apache.ambari.server.controller.AmbariHandlerList.processHandlers(AmbariHandlerList.java:221)
at org.apache.ambari.server.controller.AmbariHandlerList.processHandlers(AmbariHandlerList.java:210)
at org.apache.ambari.server.controller.AmbariHandlerList.handle(AmbariHandlerList.java:140)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:564)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
at org.eclipse.jetty.util.thread.Invocable.invokePreferred(Invocable.java:122)
at org.eclipse.jetty.util.thread.strategy.ExecutingExecutionStrategy.invoke(ExecutingExecutionStrategy.java:58)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:201)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:133)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:672)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:590)
at java.lang.Thread.run(Thread.java:745)
Ambari UI would be stuck at something like this: Workaround Try the "retry" option, for me it went past the "waiting" state and through to the next error, which is a know issue File "/usr/lib/ambari-agent/lib/resource_management/core/source.py", line 197, in get_content raise Fail("Failed to download file from {0} due to HTTP error: {1}".format(self.url, str(ex)))resource_management.core.exceptions.Fail: Failed to download file from http://xlhive3.openstacklocal:8080/resources/mysql-connector-java.jar due to HTTP error: HTTP Error 404: Not Found Try downloading and appropriately naming the mysql-connector jar. If that doesn't work, the copy the jar under /var/lib/ambari-server/resources Restart ambari-server and retry the setup
... View more
Labels:
04-27-2018
02:07 AM
Problem While using LLAP, (can happen anytime, even though it was working a second ago), you might see a MetaException complaining about incorrect MySQL server syntax. Here is the the error output with 10 retries by default, so this could get lengthy. 0: jdbc:hive2://xlautomation-1.h.c:10500/defa> analyze table L3_MONTHLY_dw_dimRisk compute statistics for columns;
Getting log thread is interrupted, since query is done!
Error: Error while compiling statement: FAILED: RuntimeException Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient (state=42000,code=40000)
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: RuntimeException Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:277)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:263)
at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:303)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:244)
at org.apache.hive.beeline.Commands.execute(Commands.java:871)
at org.apache.hive.beeline.Commands.sql(Commands.java:729)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1000)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:835)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:793)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:493)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:476)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: RuntimeException Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:376)
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:193)
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:278)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:312)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:517)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:504)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:509)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1497)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1482)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1667)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:83)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3627)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3679)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3659)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:358)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1300)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1272)
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:191)
... 15 more
Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException:null
at sun.reflect.GeneratedConstructorAccessor105.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1665)
... 26 more
Caused by: MetaException(message:javax.jdo.JDODataStoreException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'OPTION SQL_SELECT_LIMIT=DEFAULT' at line 1
at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:388)
at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:213)
at org.apache.hadoop.hive.metastore.ObjectStore.getObjectCount(ObjectStore.java:1294)
at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionCount(ObjectStore.java:1277)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
at com.sun.proxy.$Proxy28.getPartitionCount(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updateMetrics(HiveMetaStore.java:6960)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:451)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:7034)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:140)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
at sun.reflect.GeneratedConstructorAccessor105.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1665)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:83)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3627)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3679)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3659)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:358)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1300)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1272)
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:191)
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:278)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:312)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:517)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:504)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:509)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1497)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1482)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) Assumptions We are using mysql as backed Metastore DB Error shows up only while running LLAP HS2 works fine Issue encountered on HDP-2.6.4 distro Issue Known bug with older version of mysql-connector https://bugs.mysql.com/bug.php?id=66659. This seems to have been addressed in version "mysql-connector-java-5.1.26" and above. Solution(s) Use a new mysql-connector version compatible with your Mysql backend DB version If you do not see issues with HS2, then copy the mysql-connector JAR on this host where HSI (HiveServerInteractive) is installed, in this location "/usr/hdp/2.6.4.0-91/hive2/lib/". There is no softlink so it should be renamed to "mysql-connector-java.jar" Validating the version On host which serves HiveServer2 instance, look for mysql-connector* [root@xlautomation-2 ~]# find /usr/hdp/2.6.4.0-91/ -name "mysql-connector*"
/usr/hdp/2.6.4.0-91/hive2/lib/mysql-connector-java-5.1.45-bin.jar
/usr/hdp/2.6.4.0-91/hive2/lib/mysql-connector-java.jar
/usr/hdp/2.6.4.0-91/hive/lib/mysql-connector-java.jar
Use the jar -xvf command by copying the files to a temp directory, should you want to validate the version. You should be able to do so, using: [root@xlautomation-1]# mkdir abc
[root@xlautomation-1]# mv mysql-connector-java.jar ./abc
[root@xlautomation-1]# /usr/jdk64/jdk1.8.0_112/bin/jar -xvf mysql-connector-java.jar
[root@xlautomation-1]# cd META-INF/
[root@xlautomation-1]# head /tmp/abc/META-INF/MANIFEST.MF
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.7.1
Created-By: 4.4.6 20120305 (Red Hat 4.4.6-4) (Free Software Foundation
, Inc.)
Built-By: mockbuild
Bundle-Vendor: Sun Microsystems Inc.
Bundle-Classpath: .
Bundle-Version: 5.1.17
Bundle-Name: Sun Microsystems' JDBC Driver for MySQL
Bundle-ManifestVersion: 2
Default MySQL connector for HiveServer2 comes with a higher version, you can validate this using [root@xlautomation-1 META-INF]# head /tmp/bcd/META-INF/MANIFEST.MF
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.8.2
Created-By: 1.8.0_92-b14 (Oracle Corporation)
Built-By: pb2user
Specification-Title: JDBC
Specification-Version: 4.2
Specification-Vendor: Oracle Corporation
Implementation-Title: MySQL Connector Java
Implementation-Version: 5.1.45
Implementation-Vendor-Id: com.mysql
... View more
Labels:
02-02-2018
08:01 PM
1 Kudo
Issue Soon after starting Druid services, coordinator keeps going down. This is mostly experienced on a new setup. The exception in the coordinator log looks like this 2018-02-02T00:37:24,123 ERROR [main] io.druid.cli.CliCoordinator - Error when starting up. Failing.
org.skife.jdbi.v2.exceptions.CallbackFailedException: org.skife.jdbi.v2.exceptions.UnableToExecuteStatementException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'druid.druid_rules' doesn't exist [statement:"SELECT id from druid_rules where datasource=:dataSource", located:"SELECT id from druid_rules where datasource=:dataSource", rewritten:"SELECT id from druid_rules where datasource=?", arguments:{ positional:{}, named:{dataSource:'_default'}, finder:[]}]
at org.skife.jdbi.v2.DBI.withHandle(DBI.java:284) ~[jdbi-2.63.1.jar:2.63.1]
at io.druid.metadata.SQLMetadataRuleManager.createDefaultRule(SQLMetadataRuleManager.java:83) ~[druid-server-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.metadata.SQLMetadataRuleManagerProvider$1.start(SQLMetadataRuleManagerProvider.java:72) ~[druid-server-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:263) ~[java-util-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.guice.LifecycleModule$2.start(LifecycleModule.java:156) ~[druid-api-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:103) [druid-services-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.cli.ServerRunnable.run(ServerRunnable.java:41) [druid-services-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.cli.Main.main(Main.java:108) [druid-services-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
Caused by: org.skife.jdbi.v2.exceptions.UnableToExecuteStatementException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'druid.druid_rules' doesn't exist [statement:"SELECT id from druid_rules where datasource=:dataSource", located:"SELECT id from druid_rules where datasource=:dataSource", rewritten:"SELECT id from druid_rules where datasource=?", arguments:{ positional:{}, named:{dataSource:'_default'}, finder:[]}]
at org.skife.jdbi.v2.SQLStatement.internalExecute(SQLStatement.java:1334) ~[jdbi-2.63.1.jar:2.63.1]
at org.skife.jdbi.v2.Query.fold(Query.java:173) ~[jdbi-2.63.1.jar:2.63.1]
at org.skife.jdbi.v2.Query.list(Query.java:82) ~[jdbi-2.63.1.jar:2.63.1]
at org.skife.jdbi.v2.Query.list(Query.java:75) ~[jdbi-2.63.1.jar:2.63.1]
at io.druid.metadata.SQLMetadataRuleManager$1.withHandle(SQLMetadataRuleManager.java:97) ~[druid-server-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.metadata.SQLMetadataRuleManager$1.withHandle(SQLMetadataRuleManager.java:85) ~[druid-server-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at org.skife.jdbi.v2.DBI.withHandle(DBI.java:281) ~[jdbi-2.63.1.jar:2.63.1] Cause Possible cause for the issue is that druid database may have been created manually in mysql and uses "latin1" as the default character set. Resolution This is a two-step resolution. Change the character set in the backend db to "utf8" mysql> alter database druid character set utf8 collate utf8_general_ci;
Query OK, 1 row affected (0.01 sec)mysql> use druid;
Database changed mysql> show variables like "character_set_database";
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| character_set_database | utf8 |
+------------------------+-------+
1 row in set (0.02 sec) Once you restart Druid coordinator post this change, it may start up, however, you might still get similar error (like tables missing OR stack similar to this 2018-02-02T19:42:06,305 WARN [main] io.druid.java.util.common.RetryUtils - Failed on try 9, retrying in 51,342ms.
org.skife.jdbi.v2.exceptions.CallbackFailedException: org.skife.jdbi.v2.exceptions.UnableToExecuteStatementException: Could not clean up [statement:"SELECT @@character_set_database = 'utf8'", located:"SELECT @@character_set_database = 'utf8'", rewritten:"SELECT @@character_set_database = 'utf8'", arguments:{ positional:{}, named:{}, finder:
[]}]
at org.skife.jdbi.v2.DBI.withHandle(DBI.java:284) ~[jdbi-2.63.1.jar:2.63.1]
at io.druid.metadata.SQLMetadataConnector$2.call(SQLMetadataConnector.java:130) ~[druid-server-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:63) [java-util-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) [java-util-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.metadata.SQLMetadataConnector.retryWithHandle(SQLMetadataConnector.java:134) [druid-server-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.metadata.SQLMetadataConnector.retryWithHandle(SQLMetadataConnector.java:143) [druid-server-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.metadata.SQLMetadataConnector.createTable(SQLMetadataConnector.java:184) [druid-server-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.metadata.SQLMetadataConnector.createRulesTable(SQLMetadataConnector.java:282) [druid-server-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.metadata.SQLMetadataConnector.createRulesTable(SQLMetadataConnector.java:476) [druid-server-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.metadata.SQLMetadataRuleManagerProvider$1.start(SQLMetadataRuleManagerProvider.java:71) [druid-server-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:263) [java-util-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.guice.LifecycleModule$2.start(LifecycleModule.java:156) [druid-api-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:103) [druid-services-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.cli.ServerRunnable.run(ServerRunnable.java:41) [druid-services-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.cli.Main.main(Main.java:108) [druid-services-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
Caused by: org.skife.jdbi.v2.exceptions.UnableToExecuteStatementException: Could not clean up [statement:"SELECT @@character_set_database = 'utf8'", located:"SELECT @@character_set_database = 'utf8'", rewritten:"SELECT @@character_set_database = 'utf8'", arguments:{ positional:{}, named:{}, finder:[]}]
at org.skife.jdbi.v2.BaseStatement.cleanup(BaseStatement.java:105) ~[jdbi-2.63.1.jar:2.63.1]
at org.skife.jdbi.v2.Query.fold(Query.java:191) ~[jdbi-2.63.1.jar:2.63.1]
at org.skife.jdbi.v2.Query.first(Query.java:273) ~[jdbi-2.63.1.jar:2.63.1]
at org.skife.jdbi.v2.Query.first(Query.java:264) ~[jdbi-2.63.1.jar:2.63.1]
at io.druid.metadata.storage.mysql.MySQLConnector.tableExists(MySQLConnector.java:101) ~[?:?]
at io.druid.metadata.SQLMetadataConnector$4.withHandle(SQLMetadataConnector.java:190) ~[druid-server-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at io.druid.metadata.SQLMetadataConnector$4.withHandle(SQLMetadataConnector.java:186) ~[druid-server-0.10.1.2.6.4.0-91.jar:0.10.1.2.6.4.0-91]
at org.skife.jdbi.v2.DBI.withHandle(DBI.java:281) ~[jdbi-2.63.1.jar:2.63.1]
... 14 more The issue is Druid uses a slightly older JDBC jar (5.1.14) /usr/hdp/current/druid-coordinator/extensions/mysql-metadata-storage/mysql-jdbc-driver.jar Download 5.1.45 from here and replace it with existing mysql-jdbc-driver.jar cp mysql-connector-java-5.1.45-bin.jar /usr/hdp/2.6.4.0-91/druid/extensions/mysql-metadata-storage/mysql-jdbc-driver.jar Restarting druid coordinator service post this change should work.
... View more
Labels:
01-26-2018
11:01 PM
2 Kudos
Simple illustration of locking in Hive when ACID is enabled Considerations for illustration Cluster Version: HDP -2.5.6.0 Hive Version: Hive 1.2.1000 Enabled with following properties in place
hive.support.concurrency=true hive.compactor.initiator.on=true hive.compactor.worker.threads=1 hive.exec.dynamic.partition.mode=nonstrict hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager (if non-Ambari cluster) Types of Supported Locks
S = SHARED or SHARED_READ X = EXCLUSIVE Tables used for testing orc_tab (ORC format table with col1 int and col2 string), non-transactional orc_tab_bucketed(ORC format table with col1 int and col2 string, transactional) txt_tab (TEXT format table with col1 int, col2 string, non-transactional, for loading purposes) Either tables have closed to 5 GB data on a Single node cluster SCENARIO 1 (Non Transactional Table) SELECT blocks ALTER SELECT starts first followed by ALTER Session A beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "select col1 from orc_tab order by col1 limit 2" Session B beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "ALTER TABLE orc_tab ADD COLUMNS (col3 string)" Session C +----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+
| lockid | database | table | partition | lock_state | blocked_by | lock_type | transaction_id | last_heartbeat | acquired_at | user | hostname | agent_info |
+----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+
| Lock ID | Database | Table | Partition | State | Blocked By | Type | Transaction ID | Last Heartbeat | Acquired At | User | Hostname | Agent Info |
| 31.1 | default | orc_tab | NULL | ACQUIRED | | SHARED_READ | NULL | 1517003062122 | 1517003062122 | hive | xlpatch.openstacklocal | hive_20180126214422_aaeb4b28-5170-4131-b509-ef0213c8b842 |
| 32.1 | default | orc_tab | NULL | WAITING | 31.1 | EXCLUSIVE | NULL | 1517003063314 | NULL | hive | xlpatch.openstacklocal | hive_20180126214422_a65af104-05d1-4c19-ab54-7bb37b4cdbfa |
+----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+ SCENARIO 2 (Non Transactional Table) SELECT blocks INSERT OVERWRITE SELECT starts first followed by INSERT OVERWRITE Session A beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "select col1 from orc_tab order by col1 limit 2" Session B beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "INSERT OVERWRITE TABLE orc_tab SELECT col1,col2 from txt_tab" Session C 0: jdbc:hive2://localhost:10000/default> show locks;+----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+| lockid | database | table | partition | lock_state | blocked_by | lock_type | transaction_id | last_heartbeat | acquired_at | user | hostname | agent_info |+----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+| Lock ID | Database | Table | Partition | State | Blocked By | Type | Transaction ID | Last Heartbeat | Acquired At | User | Hostname | Agent Info || 36.1 | default | orc_tab | NULL | ACQUIRED | | SHARED_READ | NULL | 1517003567582 | 1517003567582 | hive | xlpatch.openstacklocal | hive_20180126215247_7537e30b-d5bf-4fc8-aa23-8e860efe1ac8 || 37.1 | default | txt_tab | NULL | WAITING | | SHARED_READ | NULL | 1517003568897 | NULL | hive | xlpatch.openstacklocal | hive_20180126215248_875685ed-a552-4009-892c-e13c61cf7eb5 || 37.2 | default | orc_tab | NULL | WAITING | 36.1 | EXCLUSIVE | NULL | 1517003568897 | NULL | hive | xlpatch.openstacklocal | hive_20180126215248_875685ed-a552-4009-892c-e13c61cf7eb5 |+----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+ SCENARIO 3 (Non Transactional Table) SELECT blocks INSERT SELECT starts first followed by INSERT Session A beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "select col1 from orc_tab order by col1 limit 2" Session B beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "INSERT INTO orc_tab SELECT col1,col2 from txt_tab limit 20" Session C 0: jdbc:hive2://localhost:10000/default> show locks;+----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+| lockid | database | table | partition | lock_state | blocked_by | lock_type | transaction_id | last_heartbeat | acquired_at | user | hostname | agent_info |+----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+| Lock ID | Database | Table | Partition | State | Blocked By | Type | Transaction ID | Last Heartbeat | Acquired At | User | Hostname | Agent Info || 38.1 | default | orc_tab | NULL | ACQUIRED | | SHARED_READ | NULL | 1517004119030 | 1517004119030 | hive | xlpatch.openstacklocal | hive_20180126220158_775842e7-5e34-42d0-b574-874076fd5204 || 39.1 | default | txt_tab | NULL | WAITING | | SHARED_READ | NULL | 1517004120971 | NULL | hive | xlpatch.openstacklocal | hive_20180126220200_9e9eeb8c-9c32-42fd-8ddf-c96f08699224 || 39.2 | default | orc_tab | NULL | WAITING | 38.1 | EXCLUSIVE | NULL | 1517004120971 | NULL | hive | xlpatch.openstacklocal | hive_20180126220200_9e9eeb8c-9c32-42fd-8ddf-c96f08699224 |+----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+4 rows selected (0.028 seconds) SCENARIO 4 (Transactional Table) SELECT does not block INSERT SELECT starts first followed by INSERT Session A beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "select col1 from orc_tab_bucketed order by col1 limit 2" Session B beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "INSERT INTO orc_tab_bucketed SELECT col1,col2 from txt_tab limit 20" Session C 0: jdbc:hive2://localhost:10000/default> show locks;+----------+-----------+-------------------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+| lockid | database | table | partition | lock_state | blocked_by | lock_type | transaction_id | last_heartbeat | acquired_at | user | hostname | agent_info |+----------+-----------+-------------------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+| Lock ID | Database | Table | Partition | State | Blocked By | Type | Transaction ID | Last Heartbeat | Acquired At | User | Hostname | Agent Info || 42.1 | default | orc_tab_bucketed | NULL | ACQUIRED | | SHARED_READ | NULL | 1517004495025 | 1517004495025 | hive | xlpatch.openstacklocal | hive_20180126220814_cae3893a-8e97-49eb-8b07-a3a60c4a6dc2 || 43.1 | default | txt_tab | NULL | ACQUIRED | | SHARED_READ | 3 | 0 | 1517004495874 | hive | xlpatch.openstacklocal | hive_20180126220815_a335e284-476a-42e0-b758-e181e6ab44e9 || 43.2 | default | orc_tab_bucketed | NULL | ACQUIRED | | SHARED_READ | 3 | 0 | 1517004495874 | hive | xlpatch.openstacklocal | hive_20180126220815_a335e284-476a-42e0-b758-e181e6ab44e9 |+----------+-----------+-------------------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+4 rows selected (0.02 seconds) SCENARIO 5 (Transactional Table) SELECT does not block INSERT OVERWRITE SELECT starts first followed by INSERT OVERWRITE Session A beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "select col1 from orc_tab_bucketed order by col1 limit 2" Session B beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "ALTER TABLE orc_tab_bucketed ADD COLUMNS (col3 string)" Session C 0: jdbc:hive2://localhost:10000/default> show locks;Getting log thread is interrupted, since query is done!+----------+-----------+-------------------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+| lockid | database | table | partition | lock_state | blocked_by | lock_type | transaction_id | last_heartbeat | acquired_at | user | hostname | agent_info |+----------+-----------+-------------------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+| Lock ID | Database | Table | Partition | State | Blocked By | Type | Transaction ID | Last Heartbeat | Acquired At | User | Hostname | Agent Info || 53.1 | default | orc_tab_bucketed | NULL | ACQUIRED | | SHARED_READ | NULL | 1517005855005 | 1517005855005 | hive | xlpatch.openstacklocal | hive_20180126223053_db2d0054-6cb6-48fb-b732-6ca677007695 || 54.1 | default | orc_tab_bucketed | NULL | WAITING | 53.1 | EXCLUSIVE | NULL | 1517005855870 | NULL | hive | xlpatch.openstacklocal | hive_20180126223054_6294af5a-15da-4178-9a83-40f150e08cb1 |+----------+-----------+-------------------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+3 rows selected (0.064 seconds) Synopsis Without "transactional" feature set to true
EXCLUSIVE lock (ALTER) waits for SHARED (SELECT) EXCLUSIVE lock (INSERT OVERWRITE) waits for SHARED (SELECT) EXCLUSIVE lock (INSERT) waits for SHARED (SELECT) With "transactional" enabled
EXCLUSIVE lock (ALTER) waits for SHARED (SELECT) INSERT/SELECT both take SHARED lock
... View more
Labels:
09-27-2017
10:23 PM
3 Kudos
Goal Create a new Ambari view for Hive Interactive. Use this link to get detailed information on configuring views. https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.1.0/bk_ambari-views/content/settings_and_cluster_configuration.html Steps
Navigate to Ambari page with admin privileges and click on the username dropdown icon
Select the views link to explore all the views available in Ambari
Collapse the "Hive" dropdown and click on "Create Instance" to create a new view for LLAP/Interactive
Give the name of this instance per your requirement
Ensure that under "Settings" tab, "User Interactive Mode" is set to true
If the cluster is kerberized, use proper auth method and principal name
Also update the proper JDBC URL with principal name NOTE: If ranger is enabled, ensure that the user trying to access the database objects does have the permissions to browse the contents of the database(s).
... View more
Labels:
08-03-2017
08:41 PM
6 Kudos
Goal: Demonstrate how to change the database location in HDFS and Metastore There are circumstances wherein we can consider moving the database location. By default, the location for default and custom databases is defined within the value of hive.metastore.warehouse.dir, which is /apps/hive/warehouse. Here are the illustrated steps to change a custom database location, for instance "dummy.db", along with the contents of the database. Verify the details of the database we would like to move to a new location [hive@xlautomation-2 ~]$ beeline -u "jdbc:hive2://xlautomation-2.h.c:10000/default;principal=hive/xlautomation-2.h.c@H.C"
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> create database dummy;
No rows affected (0.394 seconds)
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> describe database dummy;
+----------+----------+--------------------------------------------------------------+-------------+-------------+-------------+--+
| db_name | comment | location | owner_name | owner_type | parameters |
+----------+----------+--------------------------------------------------------------+-------------+-------------+-------------+--+
| dummy | | hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db | hive | USER | |
+----------+----------+--------------------------------------------------------------+-------------+-------------+-------------+--+
1 row selected (0.561 seconds)
NOTE: The example provides the database location i.e. /apps/hive/warehouse/dummy.db which needs to be updated. Verified the same using dummy table to test whether the location update was indeed successful 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> create table dummy.test123 (col1 string, col2 string) row format delimited fields terminated by ',' stored as textfile;
No rows affected (0.691 seconds)
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> insert into dummy.test123 values (1,1),(2,2),(3,3),(4,4),(5,5),(6,6);
INFO : Session is already open
INFO : Dag name: insert into dummy.tes...3),(4,4),(5,5),(6,6)(Stage-1)
INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0034)
INFO : Loading data to table dummy.test123 from hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db/test123/.hive-staging_hive_2017-08-03_16-20-11_965_647196527379814552-1/-ext-10000
INFO : Table dummy.test123 stats: [numFiles=1, numRows=6, totalSize=24, rawDataSize=18]
No rows affected (2.47 seconds)
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select * from dummy.test123;
+---------------+---------------+--+
| test123.col1 | test123.col2 |
+---------------+---------------+--+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 6 | 6 |
+---------------+---------------+--+
6 rows selected (0.451 seconds)
Create a new storage DIR of our choice (we used newdummy.db) and replicate the permission at the directory level. [hive@xlautomation-2 ~]$ hdfs dfs -mkdir -p /apps/hive/warehouse/newdummy.db
[hive@xlautomation-2 ~]$ hdfs dfs -mkdir -p /apps/hive/warehouse/newdummy.db
[hive@xlautomation-2 ~]$ hdfs dfs -chmod 777 /apps/hive/warehouse/newdummy.db
Verify if the DB (dir) level permissions are the same [hive@xlautomation-2 ~]$ hdfs dfs -ls /apps/hive/warehouse | egrep dummy.db
drwxrwxrwx - hive hdfs 0 2017-08-03 16:19 /apps/hive/warehouse/dummy.db
drwxrwxrwx - hive hdfs 0 2017-08-03 16:27 /apps/hive/warehouse/newdummy.db
Copy all the underlying contents from /apps/hive/warehouse/dummy.db/ into the new directory [hive@xlautomation-2 ~]$ hdfs dfs -cp -f -p /apps/hive/warehouse/dummy.db/* /apps/hive/warehouse/newdummy.db/ Caution: The usage of "cp" with "p" to preserve the permission is prone to the following error cp: Access time for hdfs is not configured. Please set dfs.namenode.accesstime.precision configuration parameter. This is because the value of dfs.namenode.accesstime.precision is set to 0 by default, in hortonworks HDP distribution. Since this is a client level configuration, it can be configured in hdfs-site.xml on a non-ambari managed cluster in client i.e., from 0 to 3600000. We can verify this at the client level by running the following command. [hive@xlautomation-2 ~]$ hdfs getconf -confKey dfs.namenode.accesstime.precision
3600000
Once the change is made, copy the contents of database folder /dummy.db/* to the new location i.e., /newdummy.db/ as HDFS user. We are overwriting (-f) any existing files within new directory and (-p) preserving the permissions [hdfs@xlautomation-2 ~]$ hdfs dfs -cp -f -p /apps/hive/warehouse/dummy.db/* /apps/hive/warehouse/newdummy.db/ Check the permissions once the copy is completed [hdfs@xlautomation-2 ~]$ hdfs dfs -ls /apps/hive/warehouse/dummy.db/
Found 1 items
drwxrwxrwx - hive hdfs 0 2017-08-03 16:20 /apps/hive/warehouse/dummy.db/test123
[hdfs@xlautomation-2 ~]$
[hdfs@xlautomation-2 ~]$
[hdfs@xlautomation-2 ~]$ hdfs dfs -ls /apps/hive/warehouse/newdummy.db/
Found 1 items
drwxrwxrwx - hive hdfs 0 2017-08-03 16:20 /apps/hive/warehouse/newdummy.db/test123
With the privileged user access to metastore db (hive in our case) we may need to update three tables i.e., DBS, SDS and FUNC_RU as they log the locations for database, table and function in that order. In our example, since we do not have any functions, we will just update SDS and DBS tables mysql> update SDS set location= replace(location,'hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db','hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db') where location like '%dummy.db%';
Query OK, 3 rows affected (0.53 sec)
Rows matched: 3 Changed: 3 Warnings: 0
mysql> update DBS set db_location_uri= replace(db_location_uri,'hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db','hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db') where db_location_uri like '%dummy.db%';
Query OK, 1 row affected (0.06 sec)
Rows matched: 1 Changed: 1 Warnings: 0
NOTE: If you want to try and run this before committing the changes in metastore, use begin; before and end; after your UPDATE statements. This update statement will replace all the occurrences of specified string within DBS and SDS tables. Check if the changes made to the tables were permanent, the location should be updated to */newdummy.db 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> describe database dummy;
+----------+----------+-----------------------------------------------------------------+-------------+-------------+-------------+--+
| db_name | comment | location | owner_name | owner_type | parameters |
+----------+----------+-----------------------------------------------------------------+-------------+-------------+-------------+--+
| dummy | | hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db | hive | USER | |
+----------+----------+-----------------------------------------------------------------+-------------+-------------+-------------+--+
1 row selected (0.444 seconds)
Verify the data from the table and also confirm its location 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> describe formatted dummy.test123;
+-------------------------------+-------------------------------------------------------------------------+-----------------------------+--+
| col_name | data_type | comment |
+-------------------------------+-------------------------------------------------------------------------+-----------------------------+--+
| # col_name | data_type | comment |
| | NULL | NULL |
| col1 | string | |
| col2 | string | |
| | NULL | NULL |
| # Detailed Table Information | NULL | NULL |
| Database: | dummy | NULL |
| Owner: | hive | NULL |
| CreateTime: | Thu Aug 03 16:19:33 UTC 2017 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Protect Mode: | None | NULL |
| Retention: | 0 | NULL |
| Location: | hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db/test123 | NULL |
| Table Type: | MANAGED_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | COLUMN_STATS_ACCURATE | {\"BASIC_STATS\":\"true\"} |
| | numFiles | 1 |
| | numRows | 6 |
| | rawDataSize | 18 |
| | totalSize | 24 |
| | transient_lastDdlTime | 1501777214 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL |
| InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | field.delim | , |
| | serialization.format | , |
+-------------------------------+-------------------------------------------------------------------------+-----------------------------+--+
33 rows selected (0.362 seconds)
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select * from dummy.test123;
+---------------+---------------+--+
| test123.col1 | test123.col2 |
+---------------+---------------+--+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 6 | 6 |
+---------------+---------------+--+
6 rows selected (0.275 seconds)
0: jdbc:hive2://xlautomation-2.h.c:10000/defa>
Considerations Remove the old database directory only when you are sure the tables are readable To check if hive or other privileged user has access to modify contents in metastore database, login to mysql and run the following commands (ensure that you are logged on to the node that hosts metastore database) mysql> show grants for hive;
+--------------------------------------------------------------------------------------------------------------+
| Grants for hive@% |
+--------------------------------------------------------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' IDENTIFIED BY PASSWORD '*7ACE763ED393514FE0C162B93996ECD195FFC4F5' |
| GRANT ALL PRIVILEGES ON `hive`.* TO 'hive'@'%' |
+--------------------------------------------------------------------------------------------------------------+
2 rows in set (0.02 sec)
mysql> select user,host from user;
+------+--------------------+
| user | host |
+------+--------------------+
| hive | % |
| root | 127.0.0.1 |
| root | localhost |
| root | xlautomation-2.h.c |
+------+--------------------+
4 rows in set (0.00 sec)
All the operations mentioned above was performed on a kerberized cluster hive --service metatool -updateLocation did not succeed in updating the location, it is successful when changing the namenode uri to HA short name configuration For any external tables whose locations are different, it should ideally not affect its access. Copy output of "hdfs dfs -ls -R /apps/hive/warehouse/dummy.db" to ensure that you have a copy of the permissions before getting rid of the directory.
... View more
Labels:
08-02-2017
10:14 PM
2 Kudos
Goal: Understand why statistics are useful in hive Table with stats 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select count(*) from mytable;
+---------+--+
| _c0 |
+---------+--+
| 843280 |
+---------+--+
1 row selected (0.332 seconds)
0: jdbc:hive2://xlautomation-2.h.c:10000/defa>
Here the data is available in metastore so there is no need to launch map tasks to gather how many rows are there in the table for the query. Look-ups are faster as accessing data from metastore is faster any day compared to launching map tasks. Without column statistics 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select count(col1) from abcd;
INFO : Session is already open
INFO : Dag name: select count(col1) from abcd(Stage-1)
INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0031)
+-------+--+
| _c0 |
+-------+--+
| 1000 |
+-------+--+
1 row selected (2.109 seconds) << Takes about 2 seconds
After update 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> analyze table abcd compute statistics for columns col1;
INFO : Session is already open
INFO : Dag name: analyze table abcd compute statistics...col1(Stage-0)
INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0031)
No rows affected (2.61 seconds)
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select count(col1) from abcd;
+-------+--+
| _c0 |
+-------+--+
| 1000 |
+-------+--+
1 row selected (0.344 seconds) <<< Runs within 1/3 of a second
0: jdbc:hive2://xlautomation-2.h.c:10000/defa>
When to run ANALYZE or gather statistics If the variation in the data is too much, 30% or more (depends on what is acceptable based on runtimes), we can choose to run ANALYZE. In this example, the change in the dataset is almost 200% 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> insert into abcd select * from mytable where col1 > 5000 limit 2000;
INFO : Session is already open
INFO : Dag name: insert into abcd select * from mytabl...2000(Stage-1)
INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0031)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 6 6 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 16.49 s
--------------------------------------------------------------------------------
INFO : Loading data to table default.abcd from hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/abcd/.hive-staging_hive_2017-08-02_20-39-39_024_4775642732421672051-1/-ext-10000
INFO : Table default.abcd stats: [numFiles=1, numRows=1000, totalSize=128640, rawDataSize=127640]
No rows affected (17.647 seconds) Running the query post this variation in data, the runtimes are impacted for the same query which ran faster before 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select count(col1) from abcd;
INFO : Session is already open
INFO : Dag name: select count(col1) from abcd(Stage-1)
INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0031)
+-------+--+
| _c0 |
+-------+--+
| 1000 |
+-------+--+
1 row selected (3.284 seconds) <<<<< Time increased
Lets update the statistics 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> analyze table abcd compute statistics for columns col1;
INFO : Session is already open
INFO : Dag name: analyze table abcd compute statistics...col1(Stage-0)
INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0031)
No rows affected (3.374 seconds)
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select count(col1) from abcd;
+-------+--+
| _c0 |
+-------+--+
| 1000 |
+-------+--+
1 row selected (0.346 seconds) <<<<<< Back to almost quarter of a second to fetch the same data from metastore.
Which column to pick It is not necessary to gather statistics on all the columns, we can choose to consider the columns which are being used in queries. We can verify if we are collecting stats for a column even by looking at the explain plan 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> explain select count(col2) from abcd;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| Explain |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| Plan not optimized by CBO. |
| |
| Vertex dependency in root stage |
| Reducer 2 <- Map 1 (SIMPLE_EDGE) |
| |
| Stage-0 |
| Fetch Operator |
| limit:-1 |
| Stage-1 |
| Reducer 2 |
| File Output Operator [FS_192] |
| compressed:false |
| Statistics:Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
| table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"} |
| Group By Operator [GBY_190] |
| | aggregations:["count(VALUE._col0)"] |
| | outputColumnNames:["_col0"] |
| | Statistics:Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
| |<-Map 1 [SIMPLE_EDGE] |
| Reduce Output Operator [RS_189] |
| sort order: |
| Statistics:Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
| value expressions:_col0 (type: bigint) |
| Group By Operator [GBY_188] |
| aggregations:["count(col2)"] |
| outputColumnNames:["_col0"] |
| Statistics:Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
| Select Operator [SEL_187] |
| outputColumnNames:["col2"] |
| Statistics:Num rows: 1000 Data size: 127640 Basic stats: COMPLETE Column stats: NONE |
| TableScan [TS_186] |
| alias:abcd |
| Statistics:Num rows: 1000 Data size: 127640 Basic stats: COMPLETE Column stats: NONE |
| |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
34 rows selected (0.384 seconds)
Once the stats are gathered, the plan is simplified: 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> explain select count(col2) from abcd;
+-----------------------------+--+
| Explain |
+-----------------------------+--+
| Plan not optimized by CBO. |
| |
| Stage-0 |
| Fetch Operator |
| limit:1 |
| |
+-----------------------------+--+
6 rows selected (0.306 seconds)
Considerations for Statistics Is enabled by default, can be verified using 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> set hive.stats.autogather;
+------------------------------+--+
| set |
+------------------------------+--+
| hive.stats.autogather=true |
+------------------------------+--+
1 row selected (0.03 seconds)
Stats can be manually gathered using ANALYZE for both table and column levels (one, more or all) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> analyze table zzzz compute statistics;
INFO : Session is already open
INFO : Dag name: analyze table zzzz compute statistics(Stage-0)
INFO : Tez session was closed. Reopening...
INFO : Session re-established.
INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0033)
INFO : Table default.zzzz stats: [numFiles=1, numRows=1000, totalSize=128640, rawDataSize=127640]
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 2.07 s
--------------------------------------------------------------------------------
No rows affected (21.615 seconds)
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> analyze table zzzz compute statistics for columns;
INFO : Session is already open
INFO : Dag name: analyze table zzzz compute statist...columns(Stage-0)
INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0033)
No rows affected (4.626 seconds)
0: jdbc:hive2://xlautomation-2.h.c:10000/defa> analyze table zzzz compute statistics for columns col1;
INFO : Session is already open
INFO : Dag name: analyze table zzzz compute statistics...col1(Stage-0)
INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0033)
No rows affected (3.299 seconds)
0: jdbc:hive2://xlautomation-2.h.c:10000/defa>
Can be gathered for specific partitions and partition columns ANALZYE TABLE zzz PARTITION (idate=2017-07-29) COMPUTE STATISTICS Other parameters include NOSCAN/CACHE METADATA, where when NOSCAN is specified only the number of physical files and their bytes are gathered for statistics. CACHE METADATA is relevant when hbase is being used to store the temporary metadata. 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> ANALYZE TABLE zzzz compute statistics NOSCAN;
INFO : Table default.zzzz stats: [numFiles=1, numRows=1000, totalSize=128640, rawDataSize=127640]
No rows affected (0.455 seconds)
0: jdbc:hive2://xlautomation-2.h.c:10000/defa>
Reference https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-StatisticsinHive
... View more
Labels:
05-04-2017
08:27 PM
GOAL To change the default log location for HUE from /var/log/hue to elsewhere. Steps Default HUE service writes the log files to /var/log/hue directory. This is somehow enforced in the code which states this The ``log_dir`` will replace the %LOG_DIR% in log.conf. If not specified, we look for the DESTKOP_LOG_DIR environment variable, and then default to the DEFAULT_LOG_DIR. However, trying to set either variables i.e. DESKTOP_LOG_DIR and DEFAULT_LOG_DIR don't seem to work. We can however change the %LOG_DIR% with an absolute path within /etc/hue/conf/log.conf. Here is an output of the log.conf replaced with absolute path for the log directory. #args=('%LOG_DIR%/access.log', 'a', 1000000, 3)
args=('/opt/log/hue/access.log', 'a', 1000000, 3)
--
#args=('%LOG_DIR%/error.log', 'a', 1000000, 3)
args=('/opt/log/hue/error.log', 'a', 1000000, 3)
--
#args=('%LOG_DIR%/%PROC_NAME%.log', 'a', 1000000, 3)
args=('/opt/log/hue/%PROC_NAME%.log', 'a', 1000000, 3)
--
#args=('%LOG_DIR%/shell_output.log', 'a', 1000000, 3)
args=('/opt/log/hue/shell_output.log', 'a', 1000000, 3)
--
#args=('%LOG_DIR%/shell_input.log', 'a', 1000000, 3)
args=('/opt/log/hue/shell_input.log', 'a', 1000000, 3)
NOTE: Do ensure that the new log directory exists and has hue as user and group. Post the changes, a restart of hue service should be good enough to route the new logs in the new location.
... View more
Labels:
05-04-2017
05:44 PM
Error Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 110
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.planReadPartialDataStreams(RecordReaderImpl.java:914)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readPartialDataStreams(RecordReaderImpl.java:958)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:793)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:205)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:598)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rows(ReaderImpl.java:585)
at org.apache.hadoop.hive.ql.io.orc.FileDump.printMetaDataImpl(FileDump.java:291)
at org.apache.hadoop.hive.ql.io.orc.FileDump.printMetaData(FileDump.java:261)
at org.apache.hadoop.hive.ql.io.orc.FileDump.main(FileDump.java:127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Cause The issue is related to lack of enhancement in the code RecordReaderImpl.java in 2.3.4 and lower versions while reading stream of data. The issue lies with the "includedColumns[column]" check wherein the size of output exceeds the size of array variable, around lines 916. Resolution This issue was fixed in 2.4 and above versions. Its possible that an intermediate fix might be available within one of the versions higher than 2.3.4.7. Its safer to upgrade to 2.4.x or better 2.6.
... View more
Labels:
04-14-2017
02:58 PM
2 Kudos
Steps to Create Table in Hive on S3A with Ranger Create a bucket with a unique name, I've used "myhivebucket" and do not change any details in the permissions Complete the "Create bucket" wizard by clicking on "create bucket" button Make the following entries in custom hdfs-site.xml
fs.s3a.access.key = <access key> fs.s3a.secret.key = <access secret> fs.s3a.impl = org.apache.hadoop.fs.s3a.S3AFileSystem To retrieve the value for access key and secret, follow these steps:
Login to https://aws.amazon.com/console Click on "Sign in to the console" tab Login with appropriate credentials Once logged in, you should see your login name on the top right corner of the AWS page Click on the drop-down arrow beside your login name and select "My Security Credentials" This should take you to a page titled "Your Security Credentials" From this page, collapse the option that says "Access Keys(Access Key ID and Secret Access Key)" You have to click on "create a new access key" because of Amazon limitation described here This lets you download the key/secret in this format (this is not case sensitive) AWSAccessKeyId=XXXXXXXXXXXXXXXXXXXXX AWSSecretKey=XXXXXxxxxxXXXXXxxxxxXXXXX/xxxxx Value for "fs.s3a.access.key" will be the value for "AWSAccessKeyId" Value for "fs.s3a.secret.key" will be the value for "AWSSecretKey" Login to ranger admin interface and create a policy for hive/desired user to allow the desired permissions Now login to hive with the kerberos credentials, as required via beeline and create table ensuring that the location is on s3a [hive@xlnode-standalone ~]$ beeline -u "jdbc:hive2://xlnode-standalone.hwx.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"
WARNING: Use "yarn jar" to launch YARN applications.
Connecting to jdbc:hive2://xlnode-standalone.hwx.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
Connected to: Apache Hive (version 1.2.1000.2.4.3.0-227)
Driver: Hive JDBC (version 1.2.1000.2.4.3.0-227)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.4.3.0-227 by Apache Hive
0: jdbc:hive2://xlnode-standalone.hwx.com:218> create table mys3test (col1 int, col2 string) row format delimited fields terminated by ',' stored as textfile location 's3a://myhivebucket/test';
No rows affected (12.04 seconds)
0: jdbc:hive2://xlnode-standalone.hwx.com:218>
Now try and insert some rows 0: jdbc:hive2://xlnode-standalone.hwx.com:218> insert into mys3test values (1,'test'),(2,'test');
Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [hive] does not have [UPDATE] privilege on [default/mys3test] (state=42000,code=40000)
0: jdbc:hive2://xlnode-standalone.hwx.com:218> The above error is intentional, since we do not have "UPDATE" privilege assigned via ranger, we cannot insert the values yet, allow the permission and INSERT again Validate INSERT/UPDATE and SELECT 0: jdbc:hive2://xlnode-standalone.hwx.com:218> insert into mys3test values (1,'test'),(2,'test');
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: insert into mys3test ...1,'test'),(2,'test')(Stage-1)
INFO :
INFO : Status: Running (Executing on YARN cluster with App id application_1492107639289_0002)
INFO : Map 1: -/-
INFO : Map 1: 0/1
INFO : Map 1: 0(+1)/1
INFO : Map 1: 0(+1)/1
INFO : Map 1: 0(+1)/1
INFO : Map 1: 0(+1)/1
INFO : Map 1: 0/1
INFO : Map 1: 1/1
INFO : Loading data to table default.mys3test from s3a://myhivebucket/test/.hive-staging_hive_2017-04-13_19-27-13_226_6105571528298793138-1/-ext-10000
INFO : Table default.mys3test stats: [numFiles=1, numRows=2, totalSize=14, rawDataSize=12]
No rows affected (53.854 seconds)
0: jdbc:hive2://xlnode-standalone.hwx.com:218> select * from mys3test;
+----------------+----------------+--+
| mys3test.col1 | mys3test.col2 |
+----------------+----------------+--+
| 1 | test |
| 2 | test |
+----------------+----------------+--+
2 rows selected (3.554 seconds)
0: jdbc:hive2://xlnode-standalone.hwx.com:218>
... View more
Labels: