Member since
09-02-2014
9
Posts
7
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3902 | 01-28-2015 02:26 PM | |
86495 | 01-09-2015 02:05 PM | |
12245 | 09-03-2014 12:27 PM |
07-07-2015
03:41 PM
In our case I had accidently set a default "user limits" to 1 for "max running apps per user". All of our jobs required more than one application to run at a time per user. This is configured in Clusters -> Dynamic resource pools -> Configuration -> User limits -> Default settings It could also be that your jobs are attempting to wait for resources to become available before starting. Perhaps you have too few resources available for what is being requested?
... View more
03-24-2015
12:54 PM
You are correct, the steps you took shouldn't have been neccessary but it sounds like my original issue was slightly different than yours. Were you able to see anything in the cloudera-scm-agent logs after the restart that might have pointed to an issue creating the symlinks? You should have seen things like: [03/Sep/2014 00:06:23 +0000] 21109 Thread-13 parcel_cache INFO Checking checksum of parcel CDH-5.1.2-1.cdh5.1.2.p0.3-el6.parcel... [03/Sep/2014 00:06:28 +0000] 21109 Thread-13 parcel_cache INFO Unpacking /opt/cloudera/parcel-cache/CDH-5.1.2-1.cdh5.1.2.p0.3-el6.parcel into /opt/cloudera/parcels [03/Sep/2014 00:06:56 +0000] 21109 MainThread parcel INFO Loading parcel manifest for: CDH-5.1.2-1.cdh5.1.2.p0.3 [03/Sep/2014 00:06:56 +0000] 21109 MainThread parcel INFO Ensuring users/groups exist for new parcel CDH-5.1.2-1.cdh5.1.2.p0.3. ...snip... [03/Sep/2014 00:06:57 +0000] 21109 MainThread parcel INFO Ensuring correct file permissions for new parcel CDH-5.1.2-1.cdh5.1.2.p0.3. ...snip... [03/Sep/2014 00:07:15 +0000] 21109 MainThread parcel INFO Activating system symlinks for parcel CDH-5.1.2-1.cdh5.1.2.p0.3 [03/Sep/2014 00:07:15 +0000] 21109 MainThread parcel INFO Ensuring alternatives entries are activated for parcel CDH-5.1.2-1.cdh5.1.2.p0.3. What was the status of your /opt/cloudera/parcels directory? Had you just recently changed to a new version? Did you deactivate the old parcel before activating the new parcel? Sorry I didn't respond earlier to prevent you from having to reinstall.
... View more
01-28-2015
02:26 PM
CM and CM DB stop and start seems to have fixed things.
... View more
01-28-2015
01:31 PM
I'm having a problem with the dynamic resource pool page. When I click the link to it i get a NullPointerException error. I'm still able to access the static resources pool page. Server Error A server error has occurred. Send the following information to Cloudera. Path: https://server:7183/cmf/services/30/pools/status Version: Cloudera Express 5.2.0 (#60 built by jenkins on 20141012-2239 git: 179000584849e68f98ad2a7fe710723bd6c29c98) java.lang.NullPointerException: at PredefinedViews.java line 1643 in com.cloudera.cmon.components.PredefinedViews getPoolsStatusView() Stack Trace: PredefinedViews.java line 1643 in com.cloudera.cmon.components.PredefinedViews getPoolsStatusView() PoolsController.java line 112 in com.cloudera.server.web.cmf.rman.pools.PoolsController getStatusPage() <generated> line -1 in com.cloudera.server.web.cmf.rman.pools.PoolsController$$FastClassByCGLIB$$dc73a8b invoke() MethodProxy.java line 191 in net.sf.cglib.proxy.MethodProxy invoke() Cglib2AopProxy.java line 617 in org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor intercept() <generated> line -1 in com.cloudera.server.web.cmf.rman.pools.PoolsController$$EnhancerByCGLIB$$8ed025bd getStatusPage() line -1 in sun.reflect.GeneratedMethodAccessor2333 invoke() DelegatingMethodAccessorImpl.java line 43 in sun.reflect.DelegatingMethodAccessorImpl invoke() Method.java line 606 in java.lang.reflect.Method invoke() HandlerMethodInvoker.java line 176 in org.springframework.web.bind.annotation.support.HandlerMethodInvoker invokeHandlerMethod() AnnotationMethodHandlerAdapter.java line 436 in org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter invokeHandlerMethod() AnnotationMethodHandlerAdapter.java line 424 in org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter handle() DispatcherServlet.java line 790 in org.springframework.web.servlet.DispatcherServlet doDispatch() DispatcherServlet.java line 719 in org.springframework.web.servlet.DispatcherServlet doService() FrameworkServlet.java line 669 in org.springframework.web.servlet.FrameworkServlet processRequest() FrameworkServlet.java line 574 in org.springframework.web.servlet.FrameworkServlet doGet() HttpServlet.java line 707 in javax.servlet.http.HttpServlet service() HttpServlet.java line 820 in javax.servlet.http.HttpServlet service() ServletHolder.java line 511 in org.mortbay.jetty.servlet.ServletHolder handle() ServletHandler.java line 1221 in org.mortbay.jetty.servlet.ServletHandler$CachedChain doFilter() UserAgentFilter.java line 78 in org.mortbay.servlet.UserAgentFilter doFilter() GzipFilter.java line 131 in org.mortbay.servlet.GzipFilter doFilter() ServletHandler.java line 1212 in org.mortbay.jetty.servlet.ServletHandler$CachedChain doFilter() JAMonServletFilter.java line 48 in com.jamonapi.http.JAMonServletFilter doFilter() ServletHandler.java line 1212 in org.mortbay.jetty.servlet.ServletHandler$CachedChain doFilter() JavaMelodyFacade.java line 109 in com.cloudera.enterprise.JavaMelodyFacade$MonitoringFilter doFilter() ServletHandler.java line 1212 in org.mortbay.jetty.servlet.ServletHandler$CachedChain doFilter() FilterChainProxy.java line 311 in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() FilterSecurityInterceptor.java line 116 in org.springframework.security.web.access.intercept.FilterSecurityInterceptor invoke() FilterSecurityInterceptor.java line 83 in org.springframework.security.web.access.intercept.FilterSecurityInterceptor doFilter() FilterChainProxy.java line 323 in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() ExceptionTranslationFilter.java line 113 in org.springframework.security.web.access.ExceptionTranslationFilter doFilter() FilterChainProxy.java line 323 in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() SessionManagementFilter.java line 101 in org.springframework.security.web.session.SessionManagementFilter doFilter() FilterChainProxy.java line 323 in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() AnonymousAuthenticationFilter.java line 113 in org.springframework.security.web.authentication.AnonymousAuthenticationFilter doFilter() FilterChainProxy.java line 323 in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() RememberMeAuthenticationFilter.java line 146 in org.springframework.security.web.authentication.rememberme.RememberMeAuthenticationFilter doFilter() FilterChainProxy.java line 323 in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() SecurityContextHolderAwareRequestFilter.java line 54 in org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter doFilter() FilterChainProxy.java line 323 in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() RequestCacheAwareFilter.java line 45 in org.springframework.security.web.savedrequest.RequestCacheAwareFilter doFilter() FilterChainProxy.java line 323 in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() AbstractAuthenticationProcessingFilter.java line 182 in org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter doFilter() FilterChainProxy.java line 323 in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() LogoutFilter.java line 105 in org.springframework.security.web.authentication.logout.LogoutFilter doFilter() FilterChainProxy.java line 323 in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() SecurityContextPersistenceFilter.java line 87 in org.springframework.security.web.context.SecurityContextPersistenceFilter doFilter() FilterChainProxy.java line 323 in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() ConcurrentSessionFilter.java line 125 in org.springframework.security.web.session.ConcurrentSessionFilter doFilter() FilterChainProxy.java line 323 in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() FilterChainProxy.java line 173 in org.springframework.security.web.FilterChainProxy doFilter() DelegatingFilterProxy.java line 237 in org.springframework.web.filter.DelegatingFilterProxy invokeDelegate() DelegatingFilterProxy.java line 167 in org.springframework.web.filter.DelegatingFilterProxy doFilter() ServletHandler.java line 1212 in org.mortbay.jetty.servlet.ServletHandler$CachedChain doFilter() CharacterEncodingFilter.java line 88 in org.springframework.web.filter.CharacterEncodingFilter doFilterInternal() OncePerRequestFilter.java line 76 in org.springframework.web.filter.OncePerRequestFilter doFilter() ServletHandler.java line 1212 in org.mortbay.jetty.servlet.ServletHandler$CachedChain doFilter() ServletHandler.java line 399 in org.mortbay.jetty.servlet.ServletHandler handle() SecurityHandler.java line 216 in org.mortbay.jetty.security.SecurityHandler handle() SessionHandler.java line 182 in org.mortbay.jetty.servlet.SessionHandler handle() SecurityHandler.java line 216 in org.mortbay.jetty.security.SecurityHandler handle() SecurityHandler.java line 216 in org.mortbay.jetty.security.SecurityHandler handle() ContextHandler.java line 766 in org.mortbay.jetty.handler.ContextHandler handle() WebAppContext.java line 450 in org.mortbay.jetty.webapp.WebAppContext handle() HandlerWrapper.java line 152 in org.mortbay.jetty.handler.HandlerWrapper handle() StatisticsHandler.java line 53 in org.mortbay.jetty.handler.StatisticsHandler handle() HandlerWrapper.java line 152 in org.mortbay.jetty.handler.HandlerWrapper handle() Server.java line 326 in org.mortbay.jetty.Server handle() HttpConnection.java line 542 in org.mortbay.jetty.HttpConnection handleRequest() HttpConnection.java line 928 in org.mortbay.jetty.HttpConnection$RequestHandler headerComplete() HttpParser.java line 549 in org.mortbay.jetty.HttpParser parseNext() HttpParser.java line 212 in org.mortbay.jetty.HttpParser parseAvailable() HttpConnection.java line 404 in org.mortbay.jetty.HttpConnection handle() SelectChannelEndPoint.java line 410 in org.mortbay.io.nio.SelectChannelEndPoint run() QueuedThreadPool.java line 582 in org.mortbay.thread.QueuedThreadPool$PoolThread run()
... View more
Labels:
- Labels:
-
Security
01-09-2015
02:05 PM
User error. Everything was fine with the resource pools, but there was a default user limit set.
... View more
12-31-2014
06:03 PM
1 Kudo
CDH 5.2.0-1.cdh5.2.0.p0.36 We had an issue with HDFS filling up causing a number of services to fail and after we cleared space and restarted the cluster we aren't able to run any hive workflows through oozie. It seems to get stuck allocating resources. No changes were made to YARN resource configurations which seems to be the goto for troubleshooting steps. We have plenty of resources allocated to YARN containers and there is currently no app limits set in dynamic pool resources. When I start an oozie workflow the oozie:launcher application starts normally but the hive query that is executed is always stuck in ACCEPTED state and never transitions to RUNNING. The oozie:launcher application is accepted and scheduled. 2015-01-01 00:47:48,472 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Accepted application application_1420073214126_0001 from user: admin, in queue: default, currently num of applications: 1 2015-01-01 00:47:48,475 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1420073214126_0001 State change from SUBMITTED to ACCEPTED 2015-01-01 00:47:48,475 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1420073214126_0001_000001 2015-01-01 00:47:48,476 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1420073214126_0001_000001 State change from NEW to SUBMITTED 2015-01-01 00:47:48,490 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added Application Attempt appattempt_1420073214126_0001_000001 to scheduler from user: admin 2015-01-01 00:47:48,492 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1420073214126_0001_000001 State change from SUBMITTED to SCHEDULED oozie:launcher container is allocated and acquired 2015-01-01 00:47:54,514 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1420073214126_0001_01_000001 Container Transitioned from NEW to ALLOCATED 2015-01-01 00:47:54,514 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=admin OPERATION=AM Allocated Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1420073214126_0001 CONTAINERID=container_1420073214126_0001_01_000001 2015-01-01 00:47:54,514 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_1420073214126_0001_01_000001 of capacity <memory:1024, vCores:1> on host node:8041, which has 1 containers, <memory:1024, vCores:1> used and <memory:23552, vCores:11> available after allocation 2015-01-01 00:47:54,516 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Sending NMToken for nodeId : ascn07.idc1.level3.com:8041 for container : container_1420073214126_0001_01_000001 2015-01-01 00:47:54,520 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1420073214126_0001_01_000001 Container Transitioned from ALLOCATED to ACQUIRED oozie:launcher application is allocated, launched, and starts running 2015-01-01 00:47:54,559 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1420073214126_0001_000001 State change from SCHEDULED to ALLOCATED_SAVING 2015-01-01 00:47:54,568 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1420073214126_0001_000001 State change from ALLOCATED_SAVING to ALLOCATED 2015-01-01 00:47:54,575 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching masterappattempt_1420073214126_0001_000001 <snip> 2015-01-01 00:47:54,834 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1420073214126_0001_000001 State change from ALLOCATED to LAUNCHED 2015-01-01 00:47:55,094 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1420073214126_0001_01_000001 Container Transitioned from ACQUIRED to RUNNING 2015-01-01 00:47:59,724 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM registration appattempt_1420073214126_0001_000001 2015-01-01 00:47:59,725 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=admin IP=1.1.1.1 OPERATION=Register App Master TARGET=ApplicationMasterService RESULT=SUCCESS APPID=application_1420073214126_0001 APPATTEMPTID=appattempt_1420073214126_0001_000001 2015-01-01 00:47:59,725 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1420073214126_0001_000001 State change from LAUNCHED to RUNNING Then the next job begins, which is a hive job. It transitions from new -> scheduled but a new container is never created/allocated. 2015-01-01 00:48:14,119 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 2 submitted by user admin 2015-01-01 00:48:14,119 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing application with id application_1420073214126_0002 2015-01-01 00:48:14,119 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=admin IP=1.1.1.1 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1420073214126_0002 2015-01-01 00:48:14,120 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1420073214126_0002 State change from NEW to NEW_SAVING 2015-01-01 00:48:14,120 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1420073214126_0002 2015-01-01 00:48:14,120 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1420073214126_0002 State change from NEW_SAVING to SUBMITTED 2015-01-01 00:48:14,120 WARN org.apache.hadoop.security.UserGroupInformation: No groups available for user admin 2015-01-01 00:48:14,120 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Accepted application application_1420073214126_0002 from user: admin, in queue: default, currently num of applications: 2 2015-01-01 00:48:14,121 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1420073214126_0002 State change from SUBMITTED to ACCEPTED 2015-01-01 00:48:14,121 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1420073214126_0002_000001 2015-01-01 00:48:14,121 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1420073214126_0002_000001 State change from NEW to SUBMITTED 2015-01-01 00:48:14,121 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added Application Attempt appattempt_1420073214126_0002_000001 to scheduler from user: admin 2015-01-01 00:48:14,121 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1420073214126_0002_000001 State change from SUBMITTED to SCHEDULED At this point the job never progresses. In cm->yarn applications it has a status of "Pending", on the resource manager UI it has a state of "ACCEPTED" but never transitions into "RUNNING". This issue is mentioned in a blog post from april (#5) http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/ The suggested fix of adding a value to "max running apps" has no effect.
... View more
09-03-2014
12:27 PM
4 Kudos
I have found the solution to the problem. Cloudera-scm-agent runs a tool called /usr/lib64/cmf/service/common/alternatives.sh to generate /etc/alternatives and symlinks to /usr/bin/*. This bash script executes update-alternatives based on the PARCELS_DIR and PARCEL_DIRNAME variables. There are files in /var/lib/alternatives/ which seem to be used as overrides for the update-alternatives tool. Regardless of what you give update-alternatives, if there is a file in /var/lib/alternatives for that same alternative name it will use the information from the /var/lib/alternatives file. For some reason the /var/lib/alternatives files for cloudera have two entries in them, one for the old parcel and one for the new parcel. This may have happened by reinstalling without deactivating the old cluster/parcel first. # cat /var/lib/alternatives/sqoop auto /usr/bin/sqoop /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/bin/sqoop 10 /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/bin/sqoop 10 When I remove the /var/lib/alternatives/sqoop-import file and restart cloudera-scm-agent the proper symlink is created. # ls -lsa /etc/alternatives/sqoop-import 4 lrwxrwxrwx 1 root root 64 Sep 3 19:00 /etc/alternatives/sqoop-import -> /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/bin/sqoop-import So, in closing, it may be neccessary to remove all cloudera related /var/lib/alternatives/ files after a botched install if you do not deactivate the parcel prior to reinstall. # grep -l cloudera /var/lib/alternatives/*
... View more
09-02-2014
03:50 PM
I am attempting to reinstall my cluster and no matter what I do the installation is setting the /etc/alternatives binary symlinks to an old/invalid parcels version. I followed http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Installation-Guide/cm5ig_uninstall_cm.html During the reinstallation process the parcels are distributed. [parcels]$ ls -lsa /opt/cloudera/parcels total 12 4 drwxr-xr-x 3 root root 4096 Sep 2 22:40 . 4 drwxr-xr-x 4 root root 4096 Sep 2 22:33 .. 0 lrwxrwxrwx 1 root root 25 Sep 2 22:40 CDH -> CDH-5.1.2-1.cdh5.1.2.p0.3 4 drwxrwxr-x 10 root root 4096 Aug 26 04:03 CDH-5.1.2-1.cdh5.1.2.p0.3 But the cloudera-scm-agent is creating symlinks in /etc/alternatives to a different parcels version (CDH-5.1.0-1.cdh5.1.0.p0.53). [parcels]$ ls -lsa /etc/alternatives | grep cloudera 4 lrwxrwxrwx 1 root root 63 Sep 2 22:40 avro-tools -> /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/bin/avro-tools 4 lrwxrwxrwx 1 root root 60 Sep 2 22:40 beeline -> /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/bin/beeline 4 lrwxrwxrwx 1 root root 61 Sep 2 22:40 catalogd -> /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/bin/catalogd 0 lrwxrwxrwx 1 root root 59 Sep 2 22:40 cli_mt -> /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/bin/cli_mt 0 lrwxrwxrwx 1 root root 59 Sep 2 22:40 cli_st -> /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/bin/cli_st 4 lrwxrwxrwx 1 root root 61 Sep 2 22:40 flume-ng -> /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/bin/flume-ng Where in the HECK is this information stored either on the CM server or on the agents? It's pretty important for a user to be able to remove software completely and reinstall without having to completely reinstall the operating system, which is what point I'm at right now.
... View more
Labels: