Created on 12-01-2014 02:28 AM - edited 09-16-2022 02:14 AM
We have one node cluster of CHD 5.2 with following services running on it.
HDFS
Hive
Impala
Oozie
Sqoop 1 Client
YARN (MR2 Included)
ZooKeeper
Task to perform:
1. We want to connect to MySQL DB and fetch the data in HDFS using sqoop (independently using console)
2. Above mentioned task has to be triggered using Oozie.
We were succesful in doing first task i.e by running the sqoop job independently on console.
Problem statement: Sqoop action in Oozie gives heart beat in log for long time and oozie job/ sqoop map reduce job is in running state for long time.
Scenario of cluster:
1. All services are up and running (tested individual service)
2. No application running. (no mapreduce job running)
But with the task 2, no output is generated.
Oozie job is in running state for long time. Also the Yarn application summary shows 1 Apps pending, 1 Apps Running for long time.
Approach 1:
-->oozie-sqoop_action.properties: nameNode=hdfs://cldx-1414-1259:8020 jobTracker=cldx-1414-1259:8032 queueName=default user.name=hdfs oozie.use.system.libpath=true outputDirPath=hdfs://cldx-1414-1259:8020/sqoopTest27Nov_clusterRestored/oozieImport/ sqoop_command=import --connect jdbc:mysql://172.25.38.161/test --username root --password root -m 1 --table CONTACTS --target-dir hdfs://cldx-1414-1259:8020/sqoopTest27Nov_clusterRestored/oozieImport/ oozie.wf.application.path=${nameNode}/user/hdfs/oozie-sqoop_action.xml -->oozie-sqoop_action.xml: <workflow-app xmlns='uri:oozie:workflow:0.1' name='Sqoop Action XML'> <start to='SqoopAction' /> <action name="SqoopAction"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${outputDirPath}"/> </prepare> <configuration> <property> <name>mapred.compress.map.output</name> <value>true</value> </property> </configuration> <command>${sqoop_command}</command> </sqoop> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>Sqoop failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name='end' /> </workflow-app> -->log: Container log: 2014-11-28 23:55:42,345 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1417108420351_0024_000001 2014-11-28 23:55:45,081 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.require.client.cert; Ignoring. 2014-11-28 23:55:45,090 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 2014-11-28 23:55:45,092 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.client.conf; Ignoring. 2014-11-28 23:55:45,099 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.keystores.factory.class; Ignoring. 2014-11-28 23:55:45,109 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.server.conf; Ignoring. 2014-11-28 23:55:45,148 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 2014-11-28 23:55:45,670 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens: 2014-11-28 23:55:45,670 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@3adde4f2) 2014-11-28 23:55:46,136 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: RM_DELEGATION_TOKEN, Service: 172.25.7.91:8032, Ident: (owner=hdfs, renewer=oozie mr token, realUser=oozie, issueDate=1417199124140, maxDate=1417803924140, sequenceNumber=101, masterKeyId=3) 2014-11-28 23:55:46,773 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.require.client.cert; Ignoring. 2014-11-28 23:55:46,779 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 2014-11-28 23:55:46,781 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.client.conf; Ignoring. 2014-11-28 23:55:46,783 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.keystores.factory.class; Ignoring. 2014-11-28 23:55:46,790 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.server.conf; Ignoring. 2014-11-28 23:55:46,815 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 2014-11-28 23:55:50,206 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config null 2014-11-28 23:55:50,211 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter 2014-11-28 23:55:50,522 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler 2014-11-28 23:55:50,532 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher 2014-11-28 23:55:50,547 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher 2014-11-28 23:55:50,566 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher 2014-11-28 23:55:50,573 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler 2014-11-28 23:55:50,588 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher 2014-11-28 23:55:50,602 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter 2014-11-28 23:55:50,615 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter 2014-11-28 23:55:51,853 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler 2014-11-28 23:55:55,752 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2014-11-28 23:55:56,563 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2014-11-28 23:55:56,569 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics system started 2014-11-28 23:55:56,733 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Adding job token for job_1417108420351_0024 to jobTokenSecretManager 2014-11-28 23:55:58,180 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Not uberizing job_1417108420351_0024 because: not enabled; 2014-11-28 23:55:58,434 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job job_1417108420351_0024 = 0. Number of splits = 1 2014-11-28 23:55:58,435 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Number of reduces for job job_1417108420351_0024 = 0 2014-11-28 23:55:58,436 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1417108420351_0024Job Transitioned from NEW to INITED 2014-11-28 23:55:58,447 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster launching normal, non-uberized, multi-container job job_1417108420351_0024. 2014-11-28 23:55:59,470 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue 2014-11-28 23:55:59,759 INFO [Socket Reader #1 for port 38852] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 38852 2014-11-28 23:56:00,011 INFO [main] org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB to the server 2014-11-28 23:56:00,063 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-11-28 23:56:00,070 INFO [IPC Server listener on 38852] org.apache.hadoop.ipc.Server: IPC Server listener on 38852: starting 2014-11-28 23:56:00,089 INFO [main] org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Instantiated MRClientService at cldx-1414-1259/172.25.7.91:38852 2014-11-28 23:56:01,011 INFO [main] org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2014-11-28 23:56:01,063 INFO [main] org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.mapreduce is not defined 2014-11-28 23:56:01,243 INFO [main] org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) 2014-11-28 23:56:01,299 INFO [main] org.apache.hadoop.http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context mapreduce 2014-11-28 23:56:01,300 INFO [main] org.apache.hadoop.http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context static 2014-11-28 23:56:01,341 INFO [main] org.apache.hadoop.http.HttpServer2: adding path spec: /mapreduce/* 2014-11-28 23:56:01,344 INFO [main] org.apache.hadoop.http.HttpServer2: adding path spec: /ws/* 2014-11-28 23:56:01,587 INFO [main] org.apache.hadoop.http.HttpServer2: Jetty bound to port 56301 2014-11-28 23:56:01,588 INFO [main] org.mortbay.log: jetty-6.1.26 2014-11-28 23:56:02,085 INFO [main] org.mortbay.log: Extract jar:file:/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/hadoop-yarn-common-2.5.0-cdh5.2.0.jar!/webapps/mapreduce to /tmp/Jetty_0_0_0_0_56301_mapreduce____.ai0e56/webapp 2014-11-28 23:56:07,915 INFO [main] org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:56301 2014-11-28 23:56:07,923 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Web app /mapreduce started at 56301 2014-11-28 23:56:10,911 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules 2014-11-28 23:56:10,928 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue 2014-11-28 23:56:10,953 INFO [Socket Reader #1 for port 40428] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 40428 2014-11-28 23:56:10,970 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-11-28 23:56:10,980 INFO [IPC Server listener on 40428] org.apache.hadoop.ipc.Server: IPC Server listener on 40428: starting 2014-11-28 23:56:11,055 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: nodeBlacklistingEnabled:true 2014-11-28 23:56:11,055 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: maxTaskFailuresPerNode is 3 2014-11-28 23:56:11,055 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: blacklistDisablePercent is 33 2014-11-28 23:56:11,311 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.require.client.cert; Ignoring. 2014-11-28 23:56:11,322 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 2014-11-28 23:56:11,323 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.client.conf; Ignoring. 2014-11-28 23:56:11,324 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.keystores.factory.class; Ignoring. 2014-11-28 23:56:11,326 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.server.conf; Ignoring. 2014-11-28 23:56:11,339 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 2014-11-28 23:56:11,371 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at cldx-1414-1259/172.25.7.91:8030 2014-11-28 23:56:11,715 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: maxContainerCapability: <memory:2475, vCores:4> 2014-11-28 23:56:11,715 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: queue: root.hdfs 2014-11-28 23:56:11,735 INFO [main] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Upper limit on the thread pool size is 500 2014-11-28 23:56:11,737 INFO [main] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: yarn.client.max-nodemanagers-proxies : 500 2014-11-28 23:56:11,810 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1417108420351_0024Job Transitioned from INITED to SETUP 2014-11-28 23:56:11,831 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_SETUP 2014-11-28 23:56:12,017 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1417108420351_0024Job Transitioned from SETUP to RUNNING 2014-11-28 23:56:12,163 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1417108420351_0024_m_000000 Task Transitioned from NEW to SCHEDULED 2014-11-28 23:56:12,208 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1417108420351_0024_m_000000_0 TaskAttempt Transitioned from NEW to UNASSIGNED 2014-11-28 23:56:12,236 INFO [Thread-51] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: mapResourceRequest:<memory:1024, vCores:1> 2014-11-28 23:56:12,650 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1417108420351_0024, File: hdfs://cldx-1414-1259:8020/user/hdfs/.staging/job_1417108420351_0024/job_1417108420351_0024_1.jhist 2014-11-28 23:56:12,739 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackLocal:0 2014-11-28 23:56:12,948 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1417108420351_0024: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:1451, vCores:3> knownNMs=1 2014-11-28 23:56:17,112 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1 2014-11-28 23:56:17,429 INFO [RMCommunicator Allocator] org.apache.hadoop.yarn.util.RackResolver: Resolved cldx-1414-1259 to /default 2014-11-28 23:56:17,446 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1417108420351_0024_01_000002 to attempt_1417108420351_0024_m_000000_0 2014-11-28 23:56:17,466 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0 2014-11-28 23:56:17,920 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved cldx-1414-1259 to /default 2014-11-28 23:56:17,924 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Job jar is not present. Not adding any jar to the list of resources. 2014-11-28 23:56:18,275 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: The job-conf file on the remote FS is /user/hdfs/.staging/job_1417108420351_0024/job.xml 2014-11-28 23:56:20,314 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/ST4-4.0.4.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/ST4-4.0.4.jar This will be an error in Hadoop 2.0 2014-11-28 23:56:20,336 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/activation-1.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/activation-1.1.jar This will be an error in Hadoop 2.0 org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/snappy-java-1.0.4.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/snappy-java-1.0.4.1.jar This will be an error in Hadoop 2.0 2014-11-28 23:56:21,288 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/sqoop-1.4.5-cdh5.2.0.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/sqoop-1.4.5-cdh5.2.0.jar This will be an error in Hadoop 2.0 2014-11-28 23:56:21,294 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/stringtemplate-3.2.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/stringtemplate-3.2.1.jar This will be an error in Hadoop 2.0 2014-11-28 23:56:21,297 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/xz-1.0.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/xz-1.0.jar This will be an error in Hadoop 2.0 2014-11-28 23:56:21,300 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/json-simple-1.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/oozie/json-simple-1.1.jar This will be an error in Hadoop 2.0 2014-11-28 23:56:21,302 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/oozie-hadoop-utils-2.5.0-cdh5.2.0.oozie-4.0.0-cdh5.2.0.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/oozie/oozie-hadoop-utils-2.5.0-cdh5.2.0.oozie-4.0.0-cdh5.2.0.jar This will be an error in Hadoop 2.0 2014-11-28 23:56:21,305 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/oozie-sharelib-oozie-4.0.0-cdh5.2.0.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/oozie/oozie-sharelib-oozie-4.0.0-cdh5.2.0.jar This will be an error in Hadoop 2.0 2014-11-28 23:56:21,306 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Adding #1 tokens and #1 secret keys for NM use for launching container 2014-11-28 23:56:21,306 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Size of containertokens_dob is 2 2014-11-28 23:56:21,306 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Putting shuffle token in serviceData 2014-11-28 23:56:22,565 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1417108420351_0024_m_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED 2014-11-28 23:56:22,573 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1417108420351_0024: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:427, vCores:2> knownNMs=1 2014-11-28 23:56:22,588 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1417108420351_0024_01_000002 taskAttempt attempt_1417108420351_0024_m_000000_0 2014-11-28 23:56:22,604 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1417108420351_0024_m_000000_0 2014-11-28 23:56:22,610 INFO [ContainerLauncher #0] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : cldx-1414-1259:8041 2014-11-28 23:56:23,297 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1417108420351_0024_m_000000_0 : 13562 2014-11-28 23:56:23,305 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1417108420351_0024_m_000000_0] using containerId: [container_1417108420351_0024_01_000002 on NM: [cldx-1414-1259:8041] 2014-11-28 23:56:23,314 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1417108420351_0024_m_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING 2014-11-28 23:56:23,323 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1417108420351_0024_m_000000 Task Transitioned from SCHEDULED to RUNNING 2014-11-28 23:56:28,300 INFO [Socket Reader #1 for port 40428] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1417108420351_0024 (auth:SIMPLE) 2014-11-28 23:56:28,341 INFO [IPC Server handler 0 on 40428] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1417108420351_0024_m_000002 asked for a task 2014-11-28 23:56:28,342 INFO [IPC Server handler 0 on 40428] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1417108420351_0024_m_000002 given task: attempt_1417108420351_0024_m_000000_0 2014-11-28 23:56:38,162 INFO [IPC Server handler 1 on 40428] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1417108420351_0024_m_000000_0 is : 1.0 2014-11-28 23:57:05,823 INFO [IPC Server handler 6 on 40428]
Approach 2:
-->oozie-sqoop_action.properties: nameNode=hdfs://cldx-1414-1259:8020 jobTracker=cldx-1414-1259:8032 user.name=oozie #oozie.use.system.libpath=true #import-dirSqoop=${nameNode}/sqoop/oozieImport/ sqoop_command=import --connect jdbc:mysql://172.25.38.161/test --username root --password root -m 1 --table CONTACTS --target-dir hdfs://cldx-1414-1259:8020/sqoopOozieTest/oozieImport/ oozie.wf.application.path=${nameNode}/user/oozie/oozie-sqoop_action.xml -->oozie-sqoop_action.xml: same as above -->log: Showing 4096 bytes. Click here for full log ad.bytes=4193404 mapreduce.job.ubertask.maxreduces=1 dfs.image.compress=false mapreduce.shuffle.ssl.enabled=false yarn.log-aggregation-enable=false mapreduce.tasktracker.report.address=127.0.0.1:0 mapreduce.tasktracker.http.threads=40 dfs.stream-buffer-size=4096 tfile.fs.output.buffer.size=262144 fs.permissions.umask-mode=022 dfs.client.datanode-restart.timeout=30 yarn.resourcemanager.am.max-attempts=2 ha.failover-controller.graceful-fence.connection.retries=1 hadoop.proxyuser.hdfs.groups=* dfs.datanode.drop.cache.behind.writes=false hadoop.proxyuser.HTTP.hosts=* hadoop.common.configuration.version=0.23.0 mapreduce.job.ubertask.enable=false yarn.app.mapreduce.am.resource.cpu-vcores=1 dfs.namenode.replication.work.multiplier.per.iteration=2 mapreduce.job.acl-modify-job= io.seqfile.local.dir=${hadoop.tmp.dir}/io/local fs.s3.sleepTimeSeconds=10 mapreduce.client.output.filter=FAILED ------------------------ Sqoop command arguments : import --connect jdbc:mysql://172.25.38.161/test --username root --password root -m 1 --table CONTACTS --target-dir hdfs://cldx-1414-1259:8020/sqoopOozieTest/oozieImport/ ================================================================= >>> Invoking Sqoop command line now >>> 27345 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 27456 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.5-cdh5.2.0 27558 [main] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead. 27586 [main] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 27941 [main] INFO org.apache.sqoop.manager.SqlManager - Using default fetchSize of 1000 27941 [main] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation 28601 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `CONTACTS` AS t LIMIT 1 28666 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `CONTACTS` AS t LIMIT 1 28670 [main] INFO org.apache.sqoop.orm.CompilationManager - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop-mapreduce 42076 [main] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-yarn/compile/61d055b020bc7e036b6f8a30ac787f18/CONTACTS.jar 42088 [main] WARN org.apache.sqoop.manager.MySQLManager - It looks like you are importing from mysql. 42089 [main] WARN org.apache.sqoop.manager.MySQLManager - This transfer can be faster! Use the --direct 42089 [main] WARN org.apache.sqoop.manager.MySQLManager - option to exercise a MySQL-specific fast path. 42089 [main] INFO org.apache.sqoop.manager.MySQLManager - Setting zero DATETIME behavior to convertToNull (mysql) 42100 [main] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of CONTACTS 42472 [main] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies. 45670 [main] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat
Approach 3:
-->oozie-sqoop_action.properties: nameNode=hdfs://cldx-1414-1259:8020 jobTracker=cldx-1414-1259:8032 user.name=oozie oozie.use.system.libpath=true #import-dirSqoop=${nameNode}/sqoop/oozieImport/ sqoop_command=import --connect jdbc:mysql://172.25.38.161/test --username root --password root -m 1 --table CONTACTS --target-dir hdfs://cldx-1414-1259:8020/sqoopOozieTest/oozieImport/ oozie.wf.application.path=${nameNode}/user/oozie/oozie-sqoop_action.xml -->oozie-sqoop_action.xml: same as above -->log: Showing 4096 bytes. Click here for full log che-size=10000 mapreduce.job.hdfs-servers=${fs.defaultFS} yarn.application.classpath=$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/* dfs.datanode.hdfs-blocks-metadata.enabled=true mapreduce.tasktracker.dns.nameserver=default dfs.datanode.readahead.bytes=4193404 mapreduce.job.ubertask.maxreduces=1 dfs.image.compress=false mapreduce.shuffle.ssl.enabled=false yarn.log-aggregation-enable=false mapreduce.tasktracker.report.address=127.0.0.1:0 mapreduce.tasktracker.http.threads=40 dfs.stream-buffer-size=4096 tfile.fs.output.buffer.size=262144 fs.permissions.umask-mode=022 dfs.client.datanode-restart.timeout=30 yarn.resourcemanager.am.max-attempts=2 ha.failover-controller.graceful-fence.connection.retries=1 hadoop.proxyuser.hdfs.groups=* dfs.datanode.drop.cache.behind.writes=false hadoop.proxyuser.HTTP.hosts=* hadoop.common.configuration.version=0.23.0 mapreduce.job.ubertask.enable=false yarn.app.mapreduce.am.resource.cpu-vcores=1 dfs.namenode.replication.work.multiplier.per.iteration=2 mapreduce.job.acl-modify-job= io.seqfile.local.dir=${hadoop.tmp.dir}/io/local fs.s3.sleepTimeSeconds=10 mapreduce.client.output.filter=FAILED ------------------------ Sqoop command arguments : import --connect jdbc:mysql://172.25.38.161/test --username root --password root -m 1 --table CONTACTS --target-dir hdfs://cldx-1414-1259:8020/sqoopOozieTest/oozieImport/ ================================================================= >>> Invoking Sqoop command line now >>> 50298 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 50938 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.5-cdh5.2.0 51098 [main] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead. 51221 [main] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 53116 [main] INFO org.apache.sqoop.manager.SqlManager - Using default fetchSize of 1000 53117 [main] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation 55470 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `CONTACTS` AS t LIMIT 1 55579 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `CONTACTS` AS t LIMIT 1 55589 [main] INFO org.apache.sqoop.orm.CompilationManager - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop-mapreduce Heart beat 77776 [main] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-yarn/compile/1c3dc9b68c0dae79669a15ee8b86da92/CONTACTS.jar 77792 [main] WARN org.apache.sqoop.manager.MySQLManager - It looks like you are importing from mysql. 77793 [main] WARN org.apache.sqoop.manager.MySQLManager - This transfer can be faster! Use the --direct 77793 [main] WARN org.apache.sqoop.manager.MySQLManager - option to exercise a MySQL-specific fast path. 77794 [main] INFO org.apache.sqoop.manager.MySQLManager - Setting zero DATETIME behavior to convertToNull (mysql) 77815 [main] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of CONTACTS 77960 [main] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies. 83892 [main] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat
HDFS browsed dir:
Required jars for oozie and sqoop is kept in /user/hdfs with share lib created as well.
The dir structure for /user/hdfs and /user/oozie is same.
Created 12-09-2014 12:54 AM
Hello all,
I solved the issue by increasing cores and adding one more datanode & node manager.
Thanks.
Created 03-02-2015 09:29 AM
Increasing the container memory to 8GB within YARN solved the issue.
In the section: Resource Manager Default Group -> Resource Management
Configure: Container Memory Maximum to 8GB.
Created 12-01-2014 02:32 AM
Configuration of yarn and hdfs
yarn-site.xml
<?xml version="1.0" encoding="UTF-8"?> <!--Autogenerated by Cloudera Manager--> <configuration> <property> <name>yarn.acl.enable</name> <value>true</value> </property> <property> <name>yarn.admin.acl</name> <value>*</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>cldx-1414-1259:8032</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>cldx-1414-1259:8033</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>cldx-1414-1259:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>cldx-1414-1259:8031</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>cldx-1414-1259:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.https.address</name> <value>cldx-1414-1259:8090</value> </property> <property> <name>yarn.resourcemanager.client.thread-count</name> <value>50</value> </property> <property> <name>yarn.resourcemanager.scheduler.client.thread-count</name> <value>50</value> </property> <property> <name>yarn.resourcemanager.admin.client.thread-count</name> <value>1</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>1024</value> </property> <property> <name>yarn.scheduler.increment-allocation-mb</name> <value>512</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>2475</value> </property> <property> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>1</value> </property> <property> <name>yarn.scheduler.increment-allocation-vcores</name> <value>1</value> </property> <property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>4</value> </property> <property> <name>yarn.resourcemanager.amliveliness-monitor.interval-ms</name> <value>1000</value> </property> <property> <name>yarn.am.liveness-monitor.expiry-interval-ms</name> <value>600000</value> </property> <property> <name>yarn.resourcemanager.am.max-attempts</name> <value>2</value> </property> <property> <name>yarn.resourcemanager.container.liveness-monitor.interval-ms</name> <value>600000</value> </property> <property> <name>yarn.resourcemanager.nm.liveness-monitor.interval-ms</name> <value>1000</value> </property> <property> <name>yarn.nm.liveness-monitor.expiry-interval-ms</name> <value>600000</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.client.thread-count</name> <value>50</value> </property> <property> <name>yarn.application.classpath</name> <value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*</value> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property> <property> <name>yarn.scheduler.fair.user-as-default-queue</name> <value>true</value> </property> <property> <name>yarn.scheduler.fair.preemption</name> <value>false</value> </property> <property> <name>yarn.scheduler.fair.sizebasedweight</name> <value>false</value> </property> <property> <name>yarn.scheduler.fair.assignmultiple</name> <value>false</value> </property> <property> <name>yarn.resourcemanager.max-completed-applications</name> <value>10000</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value></value> </property> </configuration>
mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?> <!--Autogenerated by Cloudera Manager--> <configuration> <property> <name>mapreduce.job.split.metainfo.maxsize</name> <value>10000000</value> </property> <property> <name>mapreduce.job.counters.max</name> <value>120</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress</name> <value>false</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>BLOCK</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.DefaultCodec</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> <property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>zlib.compress.level</name> <value>DEFAULT_COMPRESSION</value> </property> <property> <name>mapreduce.task.io.sort.factor</name> <value>64</value> </property> <property> <name>mapreduce.map.sort.spill.percent</name> <value>0.8</value> </property> <property> <name>mapreduce.reduce.shuffle.parallelcopies</name> <value>10</value> </property> <property> <name>mapreduce.task.timeout</name> <value>600000</value> </property> <property> <name>mapreduce.client.submit.file.replication</name> <value>1</value> </property> <property> <name>mapreduce.job.reduces</name> <value>2</value> </property> <property> <name>mapreduce.task.io.sort.mb</name> <value>256</value> </property> <property> <name>mapreduce.map.speculative</name> <value>false</value> </property> <property> <name>mapreduce.reduce.speculative</name> <value>false</value> </property> <property> <name>mapreduce.job.reduce.slowstart.completedmaps</name> <value>0.8</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>cldx-1414-1259:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>cldx-1414-1259:19888</value> </property> <property> <name>mapreduce.jobhistory.webapp.https.address</name> <value>cldx-1414-1259:19890</value> </property> <property> <name>mapreduce.jobhistory.admin.address</name> <value>cldx-1414-1259:10033</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name> <value>/user</value> </property> <property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>1024</value> </property> <property> <name>yarn.app.mapreduce.am.resource.cpu-vcores</name> <value>1</value> </property> <property> <name>mapreduce.job.ubertask.enable</name> <value>false</value> </property> <property> <name>yarn.app.mapreduce.am.command-opts</name> <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value> </property> <property> <name>mapreduce.map.memory.mb</name> <value>1024</value> </property> <property> <name>mapreduce.map.cpu.vcores</name> <value>1</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>1024</value> </property> <property> <name>mapreduce.reduce.cpu.vcores</name> <value>1</value> </property> <property> <name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$MR2_CLASSPATH</value> </property> <property> <name>mapreduce.admin.user.env</name> <value>LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native:$JAVA_LIBRARY_PATH</value> </property> <property> <name>mapreduce.shuffle.max.connections</name> <value>80</value> </property> </configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <!--Autogenerated by Cloudera Manager--> <configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:///dfs/nn</value> </property> <property> <name>dfs.namenode.servicerpc-address</name> <value>cldx-1414-1259:8022</value> </property> <property> <name>dfs.https.address</name> <value>cldx-1414-1259:50470</value> </property> <property> <name>dfs.https.port</name> <value>50470</value> </property> <property> <name>dfs.namenode.http-address</name> <value>cldx-1414-1259:50070</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> <property> <name>dfs.client.use.datanode.hostname</name> <value>false</value> </property> <property> <name>fs.permissions.umask-mode</name> <value>022</value> </property> <property> <name>dfs.namenode.acls.enabled</name> <value>false</value> </property> <property> <name>dfs.client.read.shortcircuit</name> <value>false</value> </property> <property> <name>dfs.domain.socket.path</name> <value>/var/run/hdfs-sockets/dn</value> </property> <property> <name>dfs.client.read.shortcircuit.skip.checksum</name> <value>false</value> </property> <property> <name>dfs.client.domain.socket.data.traffic</name> <value>false</value> </property> <property> <name>dfs.datanode.hdfs-blocks-metadata.enabled</name> <value>true</value> </property> </configuration>
core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <!--Autogenerated by Cloudera Manager--> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://cldx-1414-1259:8020</value> </property> <property> <name>fs.trash.interval</name> <value>1</value> </property> <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec</value> </property> <property> <name>hadoop.security.authentication</name> <value>simple</value> </property> <property> <name>hadoop.security.authorization</name> <value>false</value> </property> <property> <name>hadoop.rpc.protection</name> <value>authentication</value> </property> <property> <name>hadoop.security.auth_to_local</name> <value>DEFAULT</value> </property> <property> <name>hadoop.proxyuser.oozie.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.oozie.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.mapred.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.mapred.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.flume.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.flume.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.HTTP.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.HTTP.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hive.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hive.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hue.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.httpfs.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.httpfs.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hdfs.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hdfs.hosts</name> <value>*</value> </property> <property> <name>hadoop.security.group.mapping</name> <value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value> </property> <property> <name>hadoop.security.instrumentation.requires.admin</name> <value>false</value> </property> <property> <name>net.topology.script.file.name</name> <value>/etc/hadoop/conf.cloudera.yarn/topology.py</value> </property> <property> <name>io.file.buffer.size</name> <value>65536</value> </property> <property> <name>hadoop.ssl.enabled</name> <value>false</value> </property> <property> <name>hadoop.ssl.require.client.cert</name> <value>false</value> <final>true</final> </property> <property> <name>hadoop.ssl.keystores.factory.class</name> <value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value> <final>true</final> </property> <property> <name>hadoop.ssl.server.conf</name> <value>ssl-server.xml</value> <final>true</final> </property> <property> <name>hadoop.ssl.client.conf</name> <value>ssl-client.xml</value> <final>true</final> </property> </configuration>
ssl-client.xml
<?xml version="1.0" encoding="UTF-8"?> <!--Autogenerated by Cloudera Manager--> <configuration> <property> <name>ssl.client.truststore.type</name> <value>jks</value> </property> <property> <name>ssl.client.truststore.reload.interval</name> <value>10000</value> </property> </configuration>
Cluster disk info:
bash-4.1$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 54G 20G 32G 39% / tmpfs 7.8G 0 7.8G 0% /dev/shm
Cluster RAM info:
total used free shared buffers cached Mem: 15 9 5 0 0 3 -/+ buffers/cache: 5 9 Swap: 3 0 3
Cluster CPU info:
bash-4.1$ lsb_release -a LSB Version: :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 6.2 (Santiago) Release: 6.2 Codename: Santiago bash-4.1$ bash-4.1$ uname -a Linux cldx-1414-1259 2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9 08:03:13 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
Please let me know what changes has to be done (in configuration or any other place).
Created 12-09-2014 12:54 AM
Hello all,
I solved the issue by increasing cores and adding one more datanode & node manager.
Thanks.
Created 06-10-2017 07:04 AM
Created 01-09-2015 01:18 AM
I tried below solution it works perfectly for me.
1) Change the Hadoop schedule type from capacity scheduler to fair scheduler. Because for small cluster each queue assign some memory size (2048MB) to complete single map reduce job. If more than one map reduce job run in single queue mean it met deadlock.
Solution: add below property to yarn-site.xml
<property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property> <property> <name>yarn.scheduler.fair.allocation.file</name> <value>file:/%HADOOP_HOME%/etc/hadoop/fair-scheduler.xml</value> </property>
2) By default Hadoop Total memory size was allot as 8GB.
So if we run two mapreduce program memory used by Hadoop get more than 8GB so it met deadlock.
Solution: Increase the size of Total Memory of nodemanager using following properties at yarn-site.xml
<property> <name>yarn.nodemanager.resource.memory-mb</name> <value>20960</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>1024</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>2048</value> </property>
So If user try to run more than two mapreduce program mean he need to increase nodemanager or he need to increase the size of total memory of Hadoop (note: Increasing the size will reduce the system usage memory. Above property file able to run 10 map reduce program concurrently.)
Created 03-02-2015 08:33 AM
Folks:
I have the same issue while I am running the cluster in pseudo distributed mode. Can someone please share about how to resolve this issue?
Thanks,
Created 03-02-2015 09:29 AM
Increasing the container memory to 8GB within YARN solved the issue.
In the section: Resource Manager Default Group -> Resource Management
Configure: Container Memory Maximum to 8GB.
Created 05-23-2015 10:27 AM
Hi,
I am also facing similar problem in CDH 5.3 in psuedo distributed mode. Please let me know if you were able to find any solution?
Apache Pig version 0.12.0-cdh5.3.0 (rexported)
compiled Dec 16 2014, 19:05:55
Run pig script using PigRunner.run() for Pig version 0.8+
2015-05-23 09:28:23,905 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.3.0 (rexported) compiled Dec 16 2014, 19:05:55
2015-05-23 09:28:23,909 [main] INFO org.apache.pig.Main - Logging error messages to: /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/cloudera/appcache/application_1432393487863_0...
2015-05-23 09:28:24,043 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /var/lib/hadoop-yarn/.pigbootup not found
2015-05-23 09:28:24,199 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-23 09:28:24,200 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-23 09:28:24,200 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://quickstart.cloudera:8020
2015-05-23 09:28:24,207 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8032
2015-05-23 09:28:25,668 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2015-05-23 09:28:25,877 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2015-05-23 09:28:26,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2015-05-23 09:28:26,255 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2015-05-23 09:28:26,255 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2015-05-23 09:28:26,623 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at localhost/127.0.0.1:8032
2015-05-23 09:28:26,972 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2015-05-23 09:28:27,071 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2015-05-23 09:28:27,071 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-05-23 09:28:27,071 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2015-05-23 09:28:27,079 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job2635559100793052600.jar
2015-05-23 09:28:34,692 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job2635559100793052600.jar created
2015-05-23 09:28:34,692 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2015-05-23 09:28:34,737 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-05-23 09:28:34,826 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2015-05-23 09:28:34,874 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2015-05-23 09:28:34,920 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2015-05-23 09:28:34,997 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-05-23 09:28:35,086 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2015-05-23 09:28:35,086 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-23 09:28:35,114 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at localhost/127.0.0.1:8032
2015-05-23 09:28:35,359 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-23 09:28:44,197 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-05-23 09:28:44,211 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-05-23 09:28:45,222 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2015-05-23 09:28:47,256 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
Heart beat
2015-05-23 09:28:52,256 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1432393487863_0004
2015-05-23 09:28:52,256 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Kind: mapreduce.job, Service: job_1432393487863_0003, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@3a335d31)
2015-05-23 09:28:52,674 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Kind: RM_DELEGATION_TOKEN, Service: 127.0.0.1:8032, Ident: (owner=cloudera, renewer=oozie mr token, realUser=oozie, issueDate=1432397362327, maxDate=1433002162327, sequenceNumber=10, masterKeyId=2)
2015-05-23 09:29:19,549 [JobControl] WARN org.apache.hadoop.mapreduce.v2.util.MRApps - cache file (mapreduce.job.cache.files) hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20141218070949/pig/json-simple-1.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20141218070949/oozie/json-simple-1.1.jar This will be an error in Hadoop 2.0
Heart beat
2015-05-23 09:29:27,684 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1432393487863_0004
2015-05-23 09:29:29,035 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1432393487863_0004/
2015-05-23 09:29:29,036 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1432393487863_0004
2015-05-23 09:29:29,036 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases mydata
2015-05-23 09:29:29,036 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: mydata[1,9],mydata[-1,-1] C: R:
2015-05-23 09:29:29,036 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_1432393487863_0004
2015-05-23 09:29:31,194 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Created 05-23-2015 01:48 PM
Created 05-23-2015 07:34 PM