Member since
02-08-2016
793
Posts
669
Kudos Received
85
Solutions
12-24-2016
11:33 AM
4 Kudos
PROBLEM: Ambari service check for Solr fails when the active namenode is nn2. From the std-err, you will see log below ERROR: stderr: /var/lib/ambari-agent/data/errors-803.txt
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/SOLR/5.5.2.2.5/package/scripts/service_check.py", line 48, in <module>
ServiceCheck().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/SOLR/5.5.2.2.5/package/scripts/service_check.py", line 43, in service_check
user=params.solr_config_user
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 273, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 71, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 93, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 141, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 294, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of '/opt/lucidworks-hdpsearch/solr/bin/solr create_collection -c collection1 -d data_driven_schema_configs -p 8983 -s 2 -rf 1 >> /var/log/service_solr/solr-service.log 2>&1' returned 1.
Below was error message in solr log - 2016-09-15 17:04:49,886 [qtp1192108080-19] ERROR [ ] org.apache.solr.update.SolrIndexWriter (SolrIndexWriter.java:135) - Error closing IndexWriter
java.net.ConnectException: Call From dummyhost/0.0.0.0 to dummyhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy10.getListing(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:554)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy11.getListing(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1969)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1952)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:693)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
at org.apache.solr.store.hdfs.HdfsDirectory.listAll(HdfsDirectory.java:168)
at org.apache.lucene.store.FilterDirectory.listAll(FilterDirectory.java:57)
at org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:101)
at org.apache.lucene.store.FilterDirectory.listAll(FilterDirectory.java:57)
at org.apache.lucene.index.IndexFileDeleter.refresh(IndexFileDeleter.java:426)
at org.apache.lucene.index.IndexWriter.rollbackInternalNoCommit(IndexWriter.java:2099)
at org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2041)
at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1083)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1125)
at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:130)
at org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:832)
at org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:85)
at org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:358)
at org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:73)
at org.apache.solr.core.SolrCore.close(SolrCore.java:1225)
at org.apache.solr.core.SolrCore.closeAndWait(SolrCore.java:1015)
at org.apache.solr.core.CoreContainer.unload(CoreContainer.java:994)
at org.apache.solr.handler.admin.CoreAdminOperation$2.call(CoreAdminOperation.java:144)
at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:354)
at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:153)
ROOT CAUSE: This is a product defect BUG-68180. RESOLUTION: Adding below link to solr-config-env content from ambari. Place it below JAVA_HOME. Which resolved the issue. export SOLR_HDFS_CONFIG=/etc/hadoop/conf
... View more
Labels:
12-24-2016
11:25 AM
4 Kudos
SYMPTOM: Ranger is installed and managed using Ambari. Services failed to start and giving below error ERROR: Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server.py", line 185, in <module>
HiveServer().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 218, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server.py", line 85, in start
setup_ranger_hive(rolling_upgrade=rolling_restart)
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/setup_ranger_hive.py", line 50, in setup_ranger_hive
hdp_version_override = hdp_version)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/setup_ranger_plugin_xml.py", line 82, in setup_ranger_plugin
policy_user)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/ranger_functions.py", line 92, in create_ranger_repository
repo = self.get_repository_by_name_urllib2(repo_name, component, 'true', ambari_username_password_for_ranger)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/ranger_functions.py", line 57, in get_repository_by_name_urllib2
response = json.loads(result.read())
File "/usr/lib/python2.6/site-packages/ambari_simplejson/_init_.py", line 307, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 335, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 353, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/ranger_functions.py", line 108, in create_ranger_repository
raise Fail('Ambari admin username and password are blank ')
resource_management.core.exceptions.Fail: Ambari admin username and password are blank
ROOT CAUSE: This error can occur if the user changes the admin password in the Ranger Web User Interface, but neglects to also change it in Ambari's Ranger configs. The error message, however, is not very descriptive of what the problem actually is. A bug has been filed against Ambari to provide better error reporting for situations like this: https://issues.apache.org/jira/browse/AMBARI-13346 A fix is scheduled to go into Ambari version 2.1.3 and higher. RESOLUTION: If the password was indeed changed in the Ranger Web Client but not in Ambari, then: 1) Log in to the Ambari Web User Interface. 2) Click on the Ranger Service. 3) Click on the Configs tab for the Ranger server.
4) Locate the admin_password parameter in Ranger's Advanced ranger-env section. 5) Update the password to match what was entered in the Ranger Web Interface.
6) Save the settings, restart the Ranger service, then restart any services that were failing.
... View more
Labels:
12-24-2016
11:10 AM
4 Kudos
SYMPTOM: Ambari Hive View does not reconnect when its connection to Metastore is interrupted. ERROR: 03 May 2016 13:33:39,172 ERROR qtp-ambari-client-41951 ServiceFormattedException:96 - org.apache.ambari.view.hive.client.HiveClientException: H100 Unable to submit statement show databases like '*': org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe
org.apache.ambari.view.hive.client.HiveClientException: H100 Unable to submit statement show databases like '*': org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe
at org.apache.ambari.view.hive.client.Connection$3.body(Connection.java:608)
at org.apache.ambari.view.hive.client.Connection$3.body(Connection.java:590)
at org.apache.ambari.view.hive.client.HiveCall.call(HiveCall.java:101)
at org.apache.ambari.view.hive.client.Connection.execute(Connection.java:590)
at org.apache.ambari.view.hive.client.Connection.executeSync(Connection.java:629)
at org.apache.ambari.view.hive.client.DDLDelegator.getDBListCursor(DDLDelegator.java:76)
at org.apache.ambari.view.hive.client.DDLDelegator.getDBList(DDLDelegator.java:65)
at org.apache.ambari.view.hive.resources.browser.HiveBrowserService.databases(HiveBrowserService.java:88)
at sun.reflect.GeneratedMethodAccessor759.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.SubLocatorRule.accept(SubLocatorRule.java:137)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.SubLocatorRule.accept(SubLocatorRule.java:137)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.SubLocatorRule.accept(SubLocatorRule.java:137)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.SubLocatorRule.accept(SubLocatorRule.java:137)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:540)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:715)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1496)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118)
at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:103)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:45)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at org.apache.ambari.server.security.authorization.AmbariAuthorizationFilter.doFilter(AmbariAuthorizationFilter.java:196)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilter(BasicAuthenticationFilter.java:150)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:87)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:192)
at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:160)
at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:237)
at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:167)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
at org.apache.ambari.server.api.MethodOverrideFilter.doFilter(MethodOverrideFilter.java:72)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
at org.apache.ambari.server.api.AmbariPersistFilter.doFilter(AmbariPersistFilter.java:47)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
at org.apache.ambari.server.security.AbstractSecurityHeaderFilter.doFilter(AbstractSecurityHeaderFilter.java:109)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
at org.apache.ambari.server.security.AbstractSecurityHeaderFilter.doFilter(AbstractSecurityHeaderFilter.java:109)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:82)
at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:294)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at org.apache.ambari.server.controller.AmbariHandlerList.processHandlers(AmbariHandlerList.java:216)
at org.apache.ambari.server.controller.AmbariHandlerList.processHandlers(AmbariHandlerList.java:205)
at org.apache.ambari.server.controller.AmbariHandlerList.handle(AmbariHandlerList.java:152)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe
at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161)
at org.apache.thrift.transport.TSaslTransport.flush(TSaslTransport.java:471)
at org.apache.thrift.transport.TSaslClientTransport.flush(TSaslClientTransport.java:37)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65)
at org.apache.hive.service.cli.thrift.TCLIService$Client.send_ExecuteStatement(TCLIService.java:219)
at org.apache.hive.service.cli.thrift.TCLIService$Client.ExecuteStatement(TCLIService.java:211)
at org.apache.ambari.view.hive.client.Connection$3.body(Connection.java:606)
... 97 more
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:159)
ROOT CAUSE: The only way we've found to get a disrupted hive view connection to work again is to restart Ambari server, which forces the view to create a fresh connection. There should be a way for the view to auto connect after a set interval, or even a connection button for the customer to use. Restarting the Ambari server is not a viable solution This is a BUG - https://hortonworks.jira.com/browse/BUG-57145 in Ambari2.2
RESOLUTION: Upgrading to Ambari 2.4 fixed the issue. Workaround is to restart Ambari server.
... View more
Labels:
12-24-2016
07:17 AM
3 Kudos
SYMPTOM: Script runs fine in grunt, however, it fails when executed in Hue Pig editor. Script is as below - Pig script:
A = load '/tmp/baseball';
dump A; ERROR: Below are the error logs - ROOT CAUSE: The property templeton.libjars was pointing to the wrong jar files: /usr/hdp/${hdp.version}/zookeeper,/usr/hdp/${hdp.version}/hive/lib/hive-common.jar/zookeeper.jar RESOLUTION: After changing path for "templeton.libjars" to /usr/hdp/${hdp.version}/zookeeper/zookeeper.jar,/usr/hdp/${hdp.version}/hive/lib/hive-common.jar the script ran successful in the Hue Pig editor. Please do check - https://community.hortonworks.com/articles/15958/templetonlibjars-property-changed-its-value-after.html
... View more
Labels:
12-24-2016
07:01 AM
3 Kudos
SYMPTOM: Hive jobs failing on Production Aggregation cluster by giving j"ava.net.UnknownHostException: Matrix-Aggr" Error. Matrix-Aggr is the nameservice for Namenode HA. ERROR: Error log is as below - Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: Matrix-Aggr
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:665)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:601)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2619)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2635)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.getSchemaFromFS(AvroSerdeUtils.java:149)
at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:110)
at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.getSchema(AvroGenericRecordReader.java:112)
at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGenericRecordReader.java:70)
at org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
... 16 more
Caused by: java.net.UnknownHostException: Matrix-Aggr
... 33 more
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
ROOT CAUSE: HDP-2.2.4 has a bug where they reset client configuration with proper HA settings in AvroSerdeUtils.java at below line and hence we get UnknownHostException Schema s = getSchemaFromFS(schemaString, new Configuration());
RESOLUTION: This is fixed in recent version via HIVE-9299. We can workaround it by using file:// for avro.schema.url and keeping the schema file in all NodeManager machines. You might need to request for patch to HWX as workaround else get upgraded HDP to latest version.
... View more
Labels:
12-24-2016
06:38 AM
3 Kudos
SYMPTOM: During processes like adding a service, or upgrading, Ambari UI complains package not found for installation. From the log we can see that Ambari is searching for a repo higher version than cluster's current HDP version
For example:
-- Current version : Ambari Version 1.7.0 and HDP 2.2.0
-- But Ambari is searching in repo version for 2.2.8
ROOT CAUSE: It seems that /var/lib/ambari-server/resources/stacks/HDP/<VERSION>/repos/repoinfo.xml file has been updated with wrong latest version info. RESOLUTION: 1. comment out the below line in file /var/lib/ambari-server/resources/stacks/HDP/<VERSION>/repos/repoinfo.xml '<latest>http://public-repo-1.hortonworks.com/HDP/hdp_urlinfo.json</latest>' 2. Open Ambari database, and check content of table metainfo . For example : In case the metainfo_key "repo:/HDP/2.2/redhat6/HDP-<VERSION>:baseurl" is missing, use following command to add content to table : ' INSERT INTO metainfo VALUES ('repo:/HDP/2.2/redhat6/HDP-<VERSION>:baseurl', 'http://public-repo-1.hortonworks.com/HDP/centos6/2.x/GA/2.2.0.0'); '
3. restart ambari server and agents
4. yum clean all command on service master hosts 5. re-install or re-run upgrade
... View more
Labels:
12-23-2016
07:17 PM
5 Kudos
SYMPTOM: Trying to add components using ambari ui but its failing. We are using RHN satellite repos to download packages. The HDP.repo and HDP_UTILS.repo were configured with "enable=0" On all servers But they always be modified with "enable=1". Below are my repos Output: [HDP-2.5]
name=HDP-2.5
baseurl=http://172.26.64.249/hdp/centos6/HDP-2.5.3.0/
path=/
enabled=0
[HDP-UTILS-1.1.0.21]
name=HDP-UTILS-1.1.0.21
baseurl=http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos6
path=/
enabled=0
ROOT CAUSE: As ambari uses puppet it will always revert the repo files back to orignal. RESOLUTION: Modified the respective file depending upon os, in my case it was - /var/lib/ambari-server/resources/stacks/HDP/2.0.6/hooks/before-INSTALL/templates/repo_suse_rhel.j2 and replaced - enabled=1 to enabled=0
Restarted ambari server after which services were able to install using RHN Satellite repository.
... View more
Labels:
12-23-2016
06:26 PM
4 Kudos
SYMPTOM: Upon starting the App Timeline Service after an Ambari & HDP upgrade, the following errors were thrown and the service was unable to start: ERROR: 2015-08-02 22:56:24,311 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore failed in state INITED; cause:
org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 116 missing files; e.g.: /tmp/hadoop/yarn/timeline/leveldb-timeline-store.ldb/001052.sst
org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 116 missing files; e.g.: /tmp/hadoop/yarn/timeline/leveldb-timeline-store.ldb/001052.sst
ROOT CAUSE: The issue is with corrupted SST's in the app timeline db path. RESOLUTION: Navigate to /hadoop/yarn/timeline/leveldb-timeline-store.ldb From there you will see a text file named "CURRENT"
Please back this file up in /tmp and then remove the file as such:
cp /hadoop/yarn/timeline/leveldb-timeline-store.ldb/CURRENT /tmp
rm /hadoop/yarn/timeline/leveldb-timeline-store.ldb/CURRENT
Restart the service via Ambari
... View more
12-23-2016
06:14 PM
3 Kudos
SYMPTOM: Yarn timeline logs are growing very fast and the disk is now 100% utilized. Below are my configs set for ATS - Configs: <property>
<name>yarn.timeline-service.ttl-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.ttl-ms</name>
<value>1339200000</value>
/property>
<property>
<name>yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms</name>
<value>150000</value>
</property>
ROOT CAUSE: This config does not affect the semantic of the ATS purging process. However, it affects the concrete behavior of a level-db based storage implementation to do purging. This config decides the time interval between two purges in a level-db based ATS storage (like leveldb storage and rolling leveldb storage). Here in this case, the customer set ttl to 1339200000 ms, 1339200 seconds, or 372 hours or 15.5 days. On a normal cluster with limited disk space budgeted this may cause some problems (13 MB per hour). Reducing this value may help to alleviate the problem. RESOLUTION: In this case the issue was resolved by modifying the value of the property "yarn.timeline-service.ttl-ms" in the Application Timeline configuration from 1339200000, 15.5 days, to 669600000 or 7 days. <property>
<name>yarn.timeline-service.ttl-ms</name>
<value>669600000</value>
/property>
... View more
Labels:
12-27-2016
01:26 PM
Post deleting WF_ACTIONS/wf_jobs and COORD_ACTIONS/coord_jobs do we need to ensure anything on BUNDLE_JOBS/bundle_jobs. Any other steps to performed for removing stale/cache entries. Irshad Ahmed
... View more