Member since 
    
	
		
		
		08-02-2018
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                46
            
            
                Posts
            
        
                1
            
            
                Kudos Received
            
        
                1
            
            
                Solution
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 10367 | 08-09-2018 03:05 AM | 
			
    
	
		
		
		08-02-2018
	
		
		04:50 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi every one,     i have a 5 node cdh cluster.in my cluster i am observing that node managers are restarting continuously.  i am not sure what is going on i attaching the stdout and stderr and roll log.     can you please help me  Stderr  + exec /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop-yarn/bin/yarn nodemanager
Aug 02, 2018 11:30:32 AM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get
WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.
Aug 02, 2018 11:30:32 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebServices as a root resource class
Aug 02, 2018 11:30:32 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Aug 02, 2018 11:30:32 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.server.nodemanager.webapp.JAXBContextResolver as a provider class
Aug 02, 2018 11:30:32 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Aug 02, 2018 11:30:32 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.server.nodemanager.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
Aug 02, 2018 11:30:32 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
Aug 02, 2018 11:30:33 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebServices to GuiceManagedComponentProvider with the scope "Singleton"  role log  11:48:42.105 AM	INFO	ContainerManagerImpl	
Start request for container_1533205969497_0410_01_000001 by user dr.who
11:48:42.105 AM	INFO	ContainerManagerImpl	
Creating a new application reference for app application_1533205969497_0410
11:48:42.105 AM	INFO	Application	
Application application_1533205969497_0410 transitioned from NEW to INITING
11:48:42.106 AM	INFO	NMAuditLogger	
USER=dr.who	IP=172.31.24.227	OPERATION=Start Container Request	TARGET=ContainerManageImpl	RESULT=SUCCESS	APPID=application_1533205969497_0410	CONTAINERID=container_1533205969497_0410_01_000001
11:48:42.108 AM	INFO	AppLogAggregatorImpl	
rollingMonitorInterval is set as -1. The log rolling monitoring interval is disabled. The logs will be aggregated after this application is finished.
11:48:42.125 AM	INFO	Application	
Adding container_1533205969497_0410_01_000001 to application application_1533205969497_0410
11:48:42.125 AM	INFO	Application	
Application application_1533205969497_0410 transitioned from INITING to RUNNING
11:48:42.125 AM	INFO	Container	
Container container_1533205969497_0410_01_000001 transitioned from NEW to LOCALIZED
11:48:42.125 AM	INFO	AuxServices	
Got event CONTAINER_INIT for appId application_1533205969497_0410
11:48:42.125 AM	INFO	YarnShuffleService	
Initializing container container_1533205969497_0410_01_000001
11:48:42.144 AM	INFO	Container	
Container container_1533205969497_0410_01_000001 transitioned from LOCALIZED to RUNNING
11:48:42.147 AM	INFO	DefaultContainerExecutor	
launchContainer: [bash, /data0/yarn/nm/usercache/dr.who/appcache/application_1533205969497_0410/container_1533205969497_0410_01_000001/default_container_executor.sh]
11:48:42.162 AM	WARN	DefaultContainerExecutor	
Exit code from container container_1533205969497_0410_01_000001 is : 143
11:48:42.164 AM	INFO	Container	
Container container_1533205969497_0410_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
11:48:42.164 AM	INFO	ContainerLaunch	
Cleaning up container container_1533205969497_0410_01_000001
11:48:42.181 AM	INFO	DefaultContainerExecutor	
Deleting absolute path : /data0/yarn/nm/usercache/dr.who/appcache/application_1533205969497_0410/container_1533205969497_0410_01_000001
11:48:42.182 AM	WARN	NMAuditLogger	
USER=dr.who	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1533205969497_0410	CONTAINERID=container_1533205969497_0410_01_000001
11:48:42.182 AM	INFO	Container	
Container container_1533205969497_0410_01_000001 transitioned from EXITED_WITH_FAILURE to DONE
11:48:42.182 AM	INFO	Application	
Removing container_1533205969497_0410_01_000001 from application application_1533205969497_0410
11:48:42.182 AM	INFO	AppLogAggregatorImpl	
Considering container container_1533205969497_0410_01_000001 for log-aggregation
11:48:42.182 AM	INFO	AuxServices	
Got event CONTAINER_STOP for appId application_1533205969497_0410
11:48:42.182 AM	INFO	YarnShuffleService	
Stopping container container_1533205969497_0410_01_000001
11:48:43.185 AM	INFO	NodeStatusUpdaterImpl	
Removed completed containers from NM context: [container_1533205969497_0410_01_000001] 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache YARN
 
			
    
	
		
		
		07-26-2018
	
		
		11:11 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Since it is a production cluster. does it effect to other services if i restart hive2 server 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-26-2018
	
		
		09:29 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi everyone,  In my cluster i am getting the alert on hive server 2 process connection failed.but the hive server2 is running.  please find the log below  Connection failed on host abc.covert.com:10000 (Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 211, in execute
    ldap_password=ldap_password)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 79, in check_thrift_port_sasl
    timeout=check_command_timeout)
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_  sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_  kill_strategy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 297, in _call
    raise ExecuteTimeoutException(err_  msg)
ExecuteTimeoutException: Execution of 'ambari-sudo.sh su ambari-qa -l -s /bin/bash -c 'export  PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr/lib/hive/bin/:/usr/sbin/'"'"' ; ! beeline -u '"'"'jdbc:hive2://abc.covert.com:10000/;transportMode=binary;principal=hive/_HOST@COVERT.NET'"'"'  -e '"'"''"'"' 2>&1| awk '"'"'{print}'"'"'|grep -i -e '"'"'Connection refused'"'"' -e '"'"'Invalid URL'"'"''' was killed due timeout after 60 seconds
)  can you please help me how to get rid of this  Thanks in advance 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Ambari
 - 
						
							
		
			Apache Hive
 
			
    
	
		
		
		07-24-2018
	
		
		01:13 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi Every one,  we are planning to upgrade our kerberised  production cluster which is hdp 2.6 to hdp 3.0.  can you please tell me step by step procedures and best practices since i am doing it for first time.  Thanks in advance.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		07-16-2018
	
		
		09:45 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi every one,  I want to get the metrics of all services in my cluster.  1)If HDFS service is gone down. I want to know how much time
HDFS service was down.  Eg:-In HDP cluster we will get only service uptime only like(
Name node uptime is 25 days).and if I restart the service the uptime will be calculated
from there onwards  But I need to know how long the service was down and how
long the service is up and running  I am asking about not only HDFS service but also I need to
generate the report for all the services(YARN,HBASE,KNOX,etc..)  Can you please guide me how to get these uptime and down
times for all services.  Thanks in advance 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Ambari
 - 
						
							
		
			Apache Hadoop
 
			
    
	
		
		
		07-10-2018
	
		
		12:20 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 i found below information in gateway.log  can any one help me to resolve this...  2018-07-10 09:55:07,535 ERROR hadoop.gateway (DefaultTopologyService.java:loadTopology(111)) - Failed to load topology /usr/hdp/2.6.1.0-129/knox/bin/../conf/topologies/sample.xml, retrying after 50ms: org.xml.sax.SAXParseException; lineNumber: 41; columnNumber: 76; The reference to entity "ServiceAccounts" must end with the ';' delimiter.  2018-07-10 09:55:07,588 ERROR digester3.Digester (Digester.java:fatalError(1541)) - Parse Fatal Error at line 41 column 76: The reference to entity "ServiceAccounts" must end with the ';' delimiter.  org.xml.sax.SAXParseException; lineNumber: 41; columnNumber: 76; The reference to entity "ServiceAccounts" must end with the ';' delimiter.    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441)    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368)    at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1437)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1850)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3067)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649)    at org.apache.commons.digester3.Digester.parse(Digester.java:1642)    at org.apache.commons.digester3.Digester.parse(Digester.java:1701)    at org.apache.hadoop.gateway.services.topology.impl.DefaultTopologyService.loadTopologyAttempt(DefaultTopologyService.java:124)    at org.apache.hadoop.gateway.services.topology.impl.DefaultTopologyService.loadTopology(DefaultTopologyService.java:100)    at org.apache.hadoop.gateway.services.topology.impl.DefaultTopologyService.loadTopologies(DefaultTopologyService.java:235)    at org.apache.hadoop.gateway.services.topology.impl.DefaultTopologyService.reloadTopologies(DefaultTopologyService.java:320)    at org.apache.hadoop.gateway.GatewayServer.start(GatewayServer.java:422)    at org.apache.hadoop.gateway.GatewayServer.startGateway(GatewayServer.java:295)    at org.apache.hadoop.gateway.GatewayServer.main(GatewayServer.java:148)    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)    at java.lang.reflect.Method.invoke(Method.java:497)    at org.apache.hadoop.gateway.launcher.Invoker.invokeMainMethod(Invoker.java:70)    at org.apache.hadoop.gateway.launcher.Invoker.invoke(Invoker.java:39)    at org.apache.hadoop.gateway.launcher.Command.run(Command.java:99)    at org.apache.hadoop.gateway.launcher.Launcher.run(Launcher.java:69)    at org.apache.hadoop.gateway.launcher.Launcher.main(Launcher.java:46)  2018-07-10 09:55:07,589 ERROR digester3.Digester (Digester.java:parse(1652)) - An error occurred while parsing XML from '(already loaded from stream)', see nested exceptions  org.xml.sax.SAXParseException; lineNumber: 41; columnNumber: 76; The reference to entity "ServiceAccounts" must end with the ';' delimiter.    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239)    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649)    at org.apache.commons.digester3.Digester.parse(Digester.java:1642)    at org.apache.commons.digester3.Digester.parse(Digester.java:1701)    at org.apache.hadoop.gateway.services.topology.impl.DefaultTopologyService.loadTopologyAttempt(DefaultTopologyService.java:124)    at org.apache.hadoop.gateway.services.topology.impl.DefaultTopologyService.loadTopology(DefaultTopologyService.java:100)    at org.apache.hadoop.gateway.services.topology.impl.DefaultTopologyService.loadTopologies(DefaultTopologyService.java:235)    at org.apache.hadoop.gateway.services.topology.impl.DefaultTopologyService.reloadTopologies(DefaultTopologyService.java:320)    at org.apache.hadoop.gateway.GatewayServer.start(GatewayServer.java:422)    at org.apache.hadoop.gateway.GatewayServer.startGateway(GatewayServer.java:295)    at org.apache.hadoop.gateway.GatewayServer.main(GatewayServer.java:148)    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)    at java.lang.reflect.Method.invoke(Method.java:497)    at org.apache.hadoop.gateway.launcher.Invoker.invokeMainMethod(Invoker.java:70)    at org.apache.hadoop.gateway.launcher.Invoker.invoke(Invoker.java:39)    at org.apache.hadoop.gateway.launcher.Command.run(Command.java:99)  2018-07-10 09:55:07,592 ERROR hadoop.gateway (DefaultTopologyService.java:loadTopologies(252)) - Failed to load topology /usr/hdp/2.6.1.0-129/knox/bin/../conf/topologies/sample.xml: org.xml.sax.SAXParseException; lineNumber: 41; columnNumber: 76; The reference to entity "ServiceAccounts" must end with the ';' delimiter. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-10-2018
	
		
		10:27 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@Felix Albani  @Sindhu @Jay Kumar SenSharma @Geoffrey Shelton OkotHi,   knox gateway is going down on daily basis.i found only knox-gc.log with toaday's time stamp in /var/log/knox/ directory.  if you think that it is due to allocation failure can you please tell me how much memory i need to allocate for knox gateway.  in my gateway.sh file i found the below line.do i need to change the values here or any other place and please let me know how much memory i should give to resolve this issue.  APP_MEM_OPTS="-Xmx5g -XX:NewSize=3G -XX:MaxNewSize=3G -verbose:gc -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -Xloggc:/var/log/knox/knox-gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps"      Thanks in advance    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-05-2018
	
		
		10:06 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi in my cluster knox gateway was down.i found one issue in knox-gc.log.  Please find the below error  2018-07-05T01:55:29.588+0000: 314055.583: [GC (Allocation Failure) 2018-07-05T01:55:29.887+0000: 314055.882: [ParNew: 2532112K->10894K(2831168K), 0.0440397 secs] 2532112K->10894K(2837312K), 0.3432294 secs] [Times: user=0.03 sys=0.00, real=0.35 secs]  Please help me how to solve 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Knox
 
			
    
	
		
		
		07-02-2018
	
		
		12:18 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 In my cluster the two namenodes going down.  assume that server1 is having active namenode and server 2 is having standby name node.  sometimes active namenode is going down and standby name node is taking the charge as a active.  some times standby namenode is going down  how to find the corrupted journal node and from where i need to get the journal node data(fsimage,editlogs) and where i need to paste the data 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		- « Previous
 - 
						
- 1
 - 2
 
 - Next »