About bvk

bvk · ‎10-07-2018

Hi, What is the maximum number of nodes that can be managed using Cloudea Manager. Any restrictions/limitations or it is unlimited? Somewhere in Stackoverflow i remember reading it as 50. Wanted to confirm.Can you please share some info on this?

bvk · ‎08-27-2018

Hi Tomas, I am using RHEL 7.1

bvk · ‎08-26-2018

Hi, I have 3 node hadoop cluster CDH 5.10.0, Java Version: 1.8.0_171. When i start all the services, all services starts fine. But after 3-4 mins, all node manager health becomes bad with unexpected exits. soon after that resource manager also stops working. Once the Resource manager is completely stopped, all the node manager again shows good health, but resource manager still in stopped state. Below are few random logs: Node manager Log: Unable to recover container container_1535300340310_0001_01_000001 java.io.IOException: Timeout while waiting for exit code from container_1535300340310_0001_01_000001 at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:199) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:83) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Resource Manager Logs: 2018-08-05 13:56:29,100 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: RECEIVED SIGNAL 1: SIGHUP 2018-08-05 13:56:29,131 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2018-08-05 13:56:29,136 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@ip-10-0-0-6.ec2.internal:8088 2018-08-05 13:56:29,137 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2018-08-05 13:56:29,145 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2018-08-05 13:56:29,149 INFO org.apache.hadoop.ipc.Server: Stopping server on 8032 2018-08-05 13:56:29,157 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8032 2018-08-05 13:56:29,159 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2018-08-05 13:56:29,162 INFO org.apache.hadoop.ipc.Server: Stopping server on 8033 2018-08-05 13:56:29,163 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8033 2018-08-05 13:56:29,163 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2018-08-05 13:56:29,165 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state 2018-08-05 13:56:29,166 WARN org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher: org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning. 2018-08-05 13:56:29,169 INFO org.apache.hadoop.ipc.Server: Stopping server on 8030 2018-08-26 12:20:40,707 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1535300340310_0010_02_000001 Container Transitioned from RUNNING to COMPLETED 2018-08-26 12:20:40,707 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Completed container: container_1535300340310_0010_02_000001 in state: COMPLETED event:FINISHED 2018-08-26 12:20:40,707 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1535300340310_0010 CONTAINERID=container_1535300340310_0010_02_000001 2018-08-26 12:20:40,707 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Released container container_1535300340310_0010_02_000001 of capacity <memory:1024, vCores:1> on host ip-10-0-0-6.ec2.internal:8041, which currently has 1 containers, <memory:1024, vCores:1> used and <memory:2262, vCores:3> available, release resources=true 2018-08-26 12:20:40,707 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Application attempt appattempt_1535300340310_0010_000002 released container container_1535300340310_0010_02_000001 on node: host: ip-10-0-0-6.ec2.internal:8041 #containers=1 available=2262 used=1024 with event: FINISHED 2018-08-26 12:20:40,708 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1535300340310_0010_000002 with final state: FAILED, and exit status: 0 2018-08-26 12:20:40,708 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1535300340310_0010_000002 State change from LAUNCHED to FINAL_SAVING 2018-08-26 12:20:40,708 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Unregistering app attempt : appattempt_1535300340310_0010_000002 2018-08-26 12:20:40,708 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Application finished, removing password for appattempt_1535300340310_0010_000002 2018-08-26 12:20:40,708 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1535300340310_0010_000002 State change from FINAL_SAVING to FAILED 2018-08-26 12:20:40,708 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The number of failed attempts is 2. The max attempts is 2 2018-08-26 12:20:40,709 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1535300340310_0010 with final state: FAILED 2018-08-26 12:20:40,709 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1535300340310_0010 State change from ACCEPTED to FINAL_SAVING 2018-08-26 12:20:40,709 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1535300340310_0010 2018-08-26 12:20:40,709 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Application appattempt_1535300340310_0010_000002 is done. finalState=FAILED 2018-08-26 12:20:40,709 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application application_1535300340310_0010 requests cleared 2018-08-26 12:20:40,765 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1535300340310_0010 failed 2 times due to AM Container for appattempt_1535300340310_0010_000002 exited with exitCode: 0 For more detailed output, check application tracking page:http://ip-10-0-0-6.ec2.internal:8088/proxy/application_1535300340310_0010/Then, click on links to logs of each attempt. Diagnostics: Failing this attempt. Failing the application. 2018-08-26 12:20:40,765 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1535300340310_0010 State change from FINAL_SAVING to FAILED 2018-08-26 12:20:40,766 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1535300340310_0010 failed 2 times due to AM Container for appattempt_1535300340310_0010_000002 exited with exitCode: 0 For more detailed output, check application tracking page:http://ip-10-0-0-6.ec2.internal:8088/proxy/application_1535300340310_0010/Then, click on links to logs of each attempt. Diagnostics: Failing this attempt. Failing the Diagnostics: Failing this attempt. Failing the application. 2018-08-26 12:20:42,322 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1535300340310_0014 State change from FINAL_SAVING to FAILED 2018-08-26 12:20:42,322 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1535300340310_0014 failed 2 times due to AM Container for appattempt_1535300340310_0014_000002 exited with exitCode: 0 For more detailed output, check application tracking page:http://ip-10-0-0-6.ec2.internal:8088/proxy/application_1535300340310_0014/Then, click on links to logs of each attempt. Diagnostics: Failing this attempt. Failing the application. APPID=application_1535300340310_0014 2018-08-26 12:20:42,322 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1535300340310_0014,name=hadoop,user=dr.who,queue=root.users.dr_dot_who,state=FAILED,trackingUrl=http://ip-10-0-0-6.ec2.internal:8088/cluster/app/application_1535300340310_0014,appMasterHost=N/A,startTime=1535300420133,finishTime=1535300442269,finalStatus=FAILED 2018-08-26 12:20:42,584 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1535300340310_0013_02_000001 Container Transitioned from ACQUIRED to RUNNING 2018-08-26 12:20:42,595 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Making reservation: node=ip-10-0-0-10.ec2.internal app_id=application_1535300340310_0015 2018-08-26 12:20:42,595 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1535300340310_0015_02_000001 Container Transitioned from NEW to RESERVED 2018-08-26 12:20:42,595 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Reserved container container_1535300340310_0015_02_000001 on node host: ip-10-0-0-10.ec2.internal:8041 #containers=3 available=638 used=3072 for application application_1535300340310_0015 Note: Whenever I perform restart on Yarn service, all the roles starts without any issues, but after some minutes nodemanager shows bad health, and soon after this Resouce manager goes down. Please help in understanding the issue and helping it. Thanks in advance

bvk · ‎07-23-2018

You can integrate LDAP and use LDAP to create and manage users across nodes. References: https://www.cloudera.com/documentation/enterprise/5-10-x/topics/cm_sg_external_auth.html https://www.youtube.com/watch?v=Tpx0uNXJh7U

bvk · ‎07-22-2018

Hi, I am using CDH 5.10.2. In one of my hadoop machines, even after starting rpcbind, I am not able to start NFS Gateway service When checked nfs3_jsvc.err: getting below error, what could be the reason, how do i resolve it? Cannot start daemon Service exit with a return value of 5 Initializing privileged NFS client socket... java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243) Caused by: java.lang.UnsatisfiedLinkError: no management in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886) at java.lang.Runtime.loadLibrary0(Runtime.java:849) at java.lang.System.loadLibrary(System.java:1088) at sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:67) at sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:47) at java.security.AccessController.doPrivileged(Native Method) at sun.management.ManagementFactoryHelper.<clinit>(ManagementFactoryHelper.java:425) at java.lang.management.ManagementFactory.getThreadMXBean(ManagementFactory.java:336) at org.apache.hadoop.util.ReflectionUtils.<clinit>(ReflectionUtils.java:137) at org.apache.hadoop.metrics2.lib.MetricsSourceBuilder.initRegistry(MetricsSourceBuilder.java:92) at org.apache.hadoop.metrics2.lib.MetricsSourceBuilder.<init>(MetricsSourceBuilder.java:56) at org.apache.hadoop.metrics2.lib.MetricsAnnotations.makeSource(MetricsAnnotations.java:37) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSystemSource(MetricsSystemImpl.java:558) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configureSources(MetricsSystemImpl.java:536) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configure(MetricsSystemImpl.java:482) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:188) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:163) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:54) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50) at org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.createRpcProgramNfs3(RpcProgramNfs3.java:219) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.<init>(Nfs3.java:45) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.startService(Nfs3.java:67) at org.apache.hadoop.hdfs.nfs.nfs3.PrivilegedNfsGatewayStarter.start(PrivilegedNfsGatewayStarter.java:60) ... 5 more

bvk · ‎07-15-2018

Hi, Even I am not sure what happened, after trying below steps multiple time, i was getting below option to select the specific release of CM agents. So I have rebooted all my machines and aftert the reboot, i tried hitting the console, this time I was not asked to select the specific version of softwares to be installed, directly download-distribute-unpack-activate window came up. And installation was successful. Any idea on what the issue was? and after reboot how it got fixed? Regards B.V.K

bvk · ‎07-01-2018

Hi, I am trying to install CDH-5.15.0 on RHEL 7.1 of 3 node cluster by manuall installing the softwares on each host. CM:5.15.0 CDH:5.15.0 JDK:jdk1.8.0_171 Installed the CMserver, agent, daemon on master node and agent & daemon on other nodes, untar the above said jdk tar file to /opt after setting local repository. Used mysql as cloudera manager DB ran the scm_prepare_database.sh to create scm db updated the JAVA_HOME in /etc/default/cloudera-scm-server on master node. updated the server_host in /etc/cloudera-scm-agent/config.ini on all nodes started the cm-server & agent on master node, agents on other nodes. Able to login to cloudera console, hosts were detected and I landed to cluster installation wizard. Now I see that Select the specific release of the Cloudera Manager Agent you want to install on your hosts. I have already installed the neccessary softwares on all nodes, and updated the config.ini file as well. So here I shouldn't be asked to select the specific release of Agents to be installed. Am i missing any steps? How do i avoid cloudera manager installing softwares on other nodes? Thanks in advance!!!

bvk · ‎11-20-2017

Thanks a lot.. This resolved the issue : ) I have one more doubt, If I get java heap size issue like, Caused by: java.lang.OutOfMemoryError: Java heap space when running any mapreduce job, how to increase the java heap size runtime? Does “-Dmapreduce.map.java.opts=-Xmx2048m” this really do something there? I dint find any changes. Could you please advice the best way to increase java heap size? Thanks in advance

bvk · ‎11-20-2017

Hi, I have 8 node cluster, when i submit job in edge node (Pi program), it creates job in local and executes hadoop jar /opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/jars/hadoop-examples.jar pi 10 10 Number of Maps = 10 Samples per Map = 10 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 17/11/20 08:47:57 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 17/11/20 08:47:57 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 17/11/20 08:47:58 INFO input.FileInputFormat: Total input paths to process : 10 17/11/20 08:47:58 INFO mapreduce.JobSubmitter: number of splits:10 17/11/20 08:47:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local635221628_0001 17/11/20 08:47:58 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 17/11/20 08:47:58 INFO mapreduce.Job: Running job: job_local635221628_0001 17/11/20 08:47:58 INFO mapred.LocalJobRunner: OutputCommitter set in config null 17/11/20 08:47:58 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 17/11/20 08:47:58 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 17/11/20 08:47:58 INFO mapred.LocalJobRunner: Waiting for map tasks 17/11/20 08:47:58 INFO mapred.LocalJobRunner: Starting task: attempt_local635221628_0001_m_000000_0 17/11/20 08:47:58 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 17/11/20 08:47:58 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 17/11/20 08:47:58 INFO mapred.MapTask: Processing split: hdfs://nameservice-ha/user/hduser/QuasiMonteCarlo_1511185676373_1845096796/in/part0:0+118 but it executes succssfully.. But the job id job_localxxx cannot be tracked under Resource manager web ui. When I run the same job on any other node (Name node or worker node, proper job_id is getting created which will be available in resoure manager web ui) Also I noticed, when I run mapred job -list in edge node, throws me below error mapred job -list 17/11/20 08:52:30 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 17/11/20 08:52:30 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:604) at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:382) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1269) And when I run yarn application -list 17/11/20 08:52:59 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 17/11/20 08:53:00 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 17/11/20 08:53:01 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 17/11/20 08:53:02 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 17/11/20 08:53:03 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) Where as these commands works fine in other nodes, I have oozie service installed and ResourceManager Address is set to 8032. Can some one tell me what went wrong? How can I fix this issue?

bvk · ‎08-27-2017

Thanks a lot.. 🙂

Online	Offline
Last Visited	‎06-08-2020 02:20 AM

Member Since	‎08-25-2017 06:11 AM
Last Visited	‎06-08-2020 02:20 AM
Posts	21
Kudos received	1

Cloudera Community

Re: Path-B CDH Manual Installation, asks for insta...

Max no. of nodes can be managed by Cloudea

Re: Node manager & Resource manager unexpected exi...

Node manager & Resource manager unexpected exits a...

Re: Sentry + Kerberos + Impala : manage users

NFS Gateway failed to start Caused by: java.lang.U...

Re: Path-B CDH Manual Installation, asks for insta...

Path-B CDH Manual Installation, asks for installin...

Re: Job submitted on edge node runs in local host ...

Job submitted on edge node runs in local host and ...

Re: How do I resolve clock offset issue?