Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Oozie-sqoop action in CDH 5.2-Heart beat issue

avatar
New Contributor

We have one node cluster of CHD 5.2 with following services running on it.
 HDFS
 Hive
 Impala
 Oozie              
 Sqoop 1 Client              
 YARN (MR2 Included)              
 ZooKeeper
 
Task to perform:
1. We want to connect to MySQL DB and fetch the data in HDFS using sqoop (independently using console)
2. Above mentioned task has to be triggered using Oozie.

We were succesful in doing first task i.e by running the sqoop job independently on console.

Problem statement: Sqoop action in Oozie gives heart beat in log for long time and oozie job/ sqoop map reduce job is in running state for long time.
Scenario of cluster:
1. All services are up and running (tested individual service)
2. No application running. (no mapreduce job running)

But with the task 2, no output is generated.
Oozie job is in running state for long time. Also the Yarn application summary shows 1 Apps pending, 1 Apps Running for long time.

 

Approach 1:

 

-->oozie-sqoop_action.properties:

nameNode=hdfs://cldx-1414-1259:8020
jobTracker=cldx-1414-1259:8032
queueName=default

user.name=hdfs

oozie.use.system.libpath=true

outputDirPath=hdfs://cldx-1414-1259:8020/sqoopTest27Nov_clusterRestored/oozieImport/

sqoop_command=import --connect jdbc:mysql://172.25.38.161/test --username root --password root -m 1 --table CONTACTS --target-dir hdfs://cldx-1414-1259:8020/sqoopTest27Nov_clusterRestored/oozieImport/

oozie.wf.application.path=${nameNode}/user/hdfs/oozie-sqoop_action.xml




-->oozie-sqoop_action.xml:

<workflow-app xmlns='uri:oozie:workflow:0.1' name='Sqoop Action XML'>
	<start to='SqoopAction' />
	
		<action name="SqoopAction">
	        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
	            <job-tracker>${jobTracker}</job-tracker>
	            <name-node>${nameNode}</name-node>
       	   <prepare>
	               <delete path="${outputDirPath}"/>
              	
       	     </prepare>
	            <configuration>
					<property>
						<name>mapred.compress.map.output</name>
	                      <value>true</value>
					</property>
				</configuration>
	            <command>${sqoop_command}</command>
			</sqoop>
	       	 <ok to="end" />
			 <error to="fail" />
 	    </action>

	<kill name="fail">
		<message>Sqoop failed, error
			message[${wf:errorMessage(wf:lastErrorNode())}]</message>
	</kill>
	
	<end name='end' />
</workflow-app>


-->log:
Container log:


2014-11-28 23:55:42,345 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1417108420351_0024_000001
2014-11-28 23:55:45,081 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.require.client.cert;  Ignoring.
2014-11-28 23:55:45,090 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2014-11-28 23:55:45,092 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.client.conf;  Ignoring.
2014-11-28 23:55:45,099 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.keystores.factory.class;  Ignoring.
2014-11-28 23:55:45,109 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.server.conf;  Ignoring.
2014-11-28 23:55:45,148 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2014-11-28 23:55:45,670 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
2014-11-28 23:55:45,670 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@3adde4f2)
2014-11-28 23:55:46,136 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: RM_DELEGATION_TOKEN, Service: 172.25.7.91:8032, Ident: (owner=hdfs, renewer=oozie mr token, realUser=oozie, issueDate=1417199124140, maxDate=1417803924140, sequenceNumber=101, masterKeyId=3)
2014-11-28 23:55:46,773 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.require.client.cert;  Ignoring.
2014-11-28 23:55:46,779 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2014-11-28 23:55:46,781 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.client.conf;  Ignoring.
2014-11-28 23:55:46,783 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.keystores.factory.class;  Ignoring.
2014-11-28 23:55:46,790 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.server.conf;  Ignoring.
2014-11-28 23:55:46,815 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2014-11-28 23:55:50,206 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config null
2014-11-28 23:55:50,211 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
2014-11-28 23:55:50,522 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler
2014-11-28 23:55:50,532 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher
2014-11-28 23:55:50,547 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher
2014-11-28 23:55:50,566 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher
2014-11-28 23:55:50,573 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
2014-11-28 23:55:50,588 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher
2014-11-28 23:55:50,602 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
2014-11-28 23:55:50,615 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter
2014-11-28 23:55:51,853 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler
2014-11-28 23:55:55,752 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2014-11-28 23:55:56,563 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2014-11-28 23:55:56,569 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics system started
2014-11-28 23:55:56,733 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Adding job token for job_1417108420351_0024 to jobTokenSecretManager
2014-11-28 23:55:58,180 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Not uberizing job_1417108420351_0024 because: not enabled;
2014-11-28 23:55:58,434 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job job_1417108420351_0024 = 0. Number of splits = 1
2014-11-28 23:55:58,435 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Number of reduces for job job_1417108420351_0024 = 0
2014-11-28 23:55:58,436 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1417108420351_0024Job Transitioned from NEW to INITED
2014-11-28 23:55:58,447 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster launching normal, non-uberized, multi-container job job_1417108420351_0024.
2014-11-28 23:55:59,470 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2014-11-28 23:55:59,759 INFO [Socket Reader #1 for port 38852] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 38852
2014-11-28 23:56:00,011 INFO [main] org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB to the server
2014-11-28 23:56:00,063 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2014-11-28 23:56:00,070 INFO [IPC Server listener on 38852] org.apache.hadoop.ipc.Server: IPC Server listener on 38852: starting
2014-11-28 23:56:00,089 INFO [main] org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Instantiated MRClientService at cldx-1414-1259/172.25.7.91:38852
2014-11-28 23:56:01,011 INFO [main] org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2014-11-28 23:56:01,063 INFO [main] org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.mapreduce is not defined
2014-11-28 23:56:01,243 INFO [main] org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2014-11-28 23:56:01,299 INFO [main] org.apache.hadoop.http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context mapreduce
2014-11-28 23:56:01,300 INFO [main] org.apache.hadoop.http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context static
2014-11-28 23:56:01,341 INFO [main] org.apache.hadoop.http.HttpServer2: adding path spec: /mapreduce/*
2014-11-28 23:56:01,344 INFO [main] org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*
2014-11-28 23:56:01,587 INFO [main] org.apache.hadoop.http.HttpServer2: Jetty bound to port 56301
2014-11-28 23:56:01,588 INFO [main] org.mortbay.log: jetty-6.1.26
2014-11-28 23:56:02,085 INFO [main] org.mortbay.log: Extract jar:file:/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/hadoop-yarn-common-2.5.0-cdh5.2.0.jar!/webapps/mapreduce to /tmp/Jetty_0_0_0_0_56301_mapreduce____.ai0e56/webapp
2014-11-28 23:56:07,915 INFO [main] org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:56301
2014-11-28 23:56:07,923 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Web app /mapreduce started at 56301
2014-11-28 23:56:10,911 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2014-11-28 23:56:10,928 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2014-11-28 23:56:10,953 INFO [Socket Reader #1 for port 40428] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 40428
2014-11-28 23:56:10,970 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2014-11-28 23:56:10,980 INFO [IPC Server listener on 40428] org.apache.hadoop.ipc.Server: IPC Server listener on 40428: starting
2014-11-28 23:56:11,055 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: nodeBlacklistingEnabled:true
2014-11-28 23:56:11,055 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: maxTaskFailuresPerNode is 3
2014-11-28 23:56:11,055 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: blacklistDisablePercent is 33
2014-11-28 23:56:11,311 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.require.client.cert;  Ignoring.
2014-11-28 23:56:11,322 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2014-11-28 23:56:11,323 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.client.conf;  Ignoring.
2014-11-28 23:56:11,324 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.keystores.factory.class;  Ignoring.
2014-11-28 23:56:11,326 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.server.conf;  Ignoring.
2014-11-28 23:56:11,339 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2014-11-28 23:56:11,371 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at cldx-1414-1259/172.25.7.91:8030
2014-11-28 23:56:11,715 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: maxContainerCapability: <memory:2475, vCores:4>
2014-11-28 23:56:11,715 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: queue: root.hdfs
2014-11-28 23:56:11,735 INFO [main] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Upper limit on the thread pool size is 500
2014-11-28 23:56:11,737 INFO [main] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: yarn.client.max-nodemanagers-proxies : 500
2014-11-28 23:56:11,810 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1417108420351_0024Job Transitioned from INITED to SETUP
2014-11-28 23:56:11,831 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_SETUP
2014-11-28 23:56:12,017 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1417108420351_0024Job Transitioned from SETUP to RUNNING
2014-11-28 23:56:12,163 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1417108420351_0024_m_000000 Task Transitioned from NEW to SCHEDULED
2014-11-28 23:56:12,208 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1417108420351_0024_m_000000_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2014-11-28 23:56:12,236 INFO [Thread-51] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: mapResourceRequest:<memory:1024, vCores:1>
2014-11-28 23:56:12,650 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1417108420351_0024, File: hdfs://cldx-1414-1259:8020/user/hdfs/.staging/job_1417108420351_0024/job_1417108420351_0024_1.jhist
2014-11-28 23:56:12,739 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackLocal:0
2014-11-28 23:56:12,948 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1417108420351_0024: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:1451, vCores:3> knownNMs=1
2014-11-28 23:56:17,112 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1
2014-11-28 23:56:17,429 INFO [RMCommunicator Allocator] org.apache.hadoop.yarn.util.RackResolver: Resolved cldx-1414-1259 to /default
2014-11-28 23:56:17,446 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1417108420351_0024_01_000002 to attempt_1417108420351_0024_m_000000_0
2014-11-28 23:56:17,466 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2014-11-28 23:56:17,920 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved cldx-1414-1259 to /default
2014-11-28 23:56:17,924 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Job jar is not present. Not adding any jar to the list of resources.
2014-11-28 23:56:18,275 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: The job-conf file on the remote FS is /user/hdfs/.staging/job_1417108420351_0024/job.xml
2014-11-28 23:56:20,314 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/ST4-4.0.4.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/ST4-4.0.4.jar This will be an error in Hadoop 2.0
2014-11-28 23:56:20,336 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/activation-1.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/activation-1.1.jar This will be an error in Hadoop 2.0
org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/snappy-java-1.0.4.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/snappy-java-1.0.4.1.jar This will be an error in Hadoop 2.0
2014-11-28 23:56:21,288 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/sqoop-1.4.5-cdh5.2.0.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/sqoop-1.4.5-cdh5.2.0.jar This will be an error in Hadoop 2.0
2014-11-28 23:56:21,294 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/stringtemplate-3.2.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/stringtemplate-3.2.1.jar This will be an error in Hadoop 2.0
2014-11-28 23:56:21,297 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/xz-1.0.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/xz-1.0.jar This will be an error in Hadoop 2.0
2014-11-28 23:56:21,300 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/json-simple-1.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/oozie/json-simple-1.1.jar This will be an error in Hadoop 2.0
2014-11-28 23:56:21,302 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/oozie-hadoop-utils-2.5.0-cdh5.2.0.oozie-4.0.0-cdh5.2.0.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/oozie/oozie-hadoop-utils-2.5.0-cdh5.2.0.oozie-4.0.0-cdh5.2.0.jar This will be an error in Hadoop 2.0
2014-11-28 23:56:21,305 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/oozie-sharelib-oozie-4.0.0-cdh5.2.0.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/oozie/oozie-sharelib-oozie-4.0.0-cdh5.2.0.jar This will be an error in Hadoop 2.0
2014-11-28 23:56:21,306 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Adding #1 tokens and #1 secret keys for NM use for launching container
2014-11-28 23:56:21,306 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Size of containertokens_dob is 2
2014-11-28 23:56:21,306 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Putting shuffle token in serviceData
2014-11-28 23:56:22,565 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1417108420351_0024_m_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
2014-11-28 23:56:22,573 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1417108420351_0024: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:427, vCores:2> knownNMs=1
2014-11-28 23:56:22,588 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1417108420351_0024_01_000002 taskAttempt attempt_1417108420351_0024_m_000000_0
2014-11-28 23:56:22,604 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1417108420351_0024_m_000000_0
2014-11-28 23:56:22,610 INFO [ContainerLauncher #0] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : cldx-1414-1259:8041
2014-11-28 23:56:23,297 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1417108420351_0024_m_000000_0 : 13562
2014-11-28 23:56:23,305 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1417108420351_0024_m_000000_0] using containerId: [container_1417108420351_0024_01_000002 on NM: [cldx-1414-1259:8041]
2014-11-28 23:56:23,314 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1417108420351_0024_m_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
2014-11-28 23:56:23,323 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1417108420351_0024_m_000000 Task Transitioned from SCHEDULED to RUNNING	
2014-11-28 23:56:28,300 INFO [Socket Reader #1 for port 40428] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1417108420351_0024 (auth:SIMPLE)
2014-11-28 23:56:28,341 INFO [IPC Server handler 0 on 40428] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1417108420351_0024_m_000002 asked for a task
2014-11-28 23:56:28,342 INFO [IPC Server handler 0 on 40428] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1417108420351_0024_m_000002 given task: attempt_1417108420351_0024_m_000000_0
2014-11-28 23:56:38,162 INFO [IPC Server handler 1 on 40428] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1417108420351_0024_m_000000_0 is : 1.0
2014-11-28 23:57:05,823 INFO [IPC Server handler 6 on 40428] 


 

Approach 2:

 

-->oozie-sqoop_action.properties:

nameNode=hdfs://cldx-1414-1259:8020
jobTracker=cldx-1414-1259:8032

user.name=oozie

#oozie.use.system.libpath=true

#import-dirSqoop=${nameNode}/sqoop/oozieImport/

sqoop_command=import --connect jdbc:mysql://172.25.38.161/test --username root --password root -m 1 --table CONTACTS --target-dir hdfs://cldx-1414-1259:8020/sqoopOozieTest/oozieImport/

oozie.wf.application.path=${nameNode}/user/oozie/oozie-sqoop_action.xml



-->oozie-sqoop_action.xml:
same as above


-->log:
Showing 4096 bytes. Click here for full log
ad.bytes=4193404
mapreduce.job.ubertask.maxreduces=1
dfs.image.compress=false
mapreduce.shuffle.ssl.enabled=false
yarn.log-aggregation-enable=false
mapreduce.tasktracker.report.address=127.0.0.1:0
mapreduce.tasktracker.http.threads=40
dfs.stream-buffer-size=4096
tfile.fs.output.buffer.size=262144
fs.permissions.umask-mode=022
dfs.client.datanode-restart.timeout=30
yarn.resourcemanager.am.max-attempts=2
ha.failover-controller.graceful-fence.connection.retries=1
hadoop.proxyuser.hdfs.groups=*
dfs.datanode.drop.cache.behind.writes=false
hadoop.proxyuser.HTTP.hosts=*
hadoop.common.configuration.version=0.23.0
mapreduce.job.ubertask.enable=false
yarn.app.mapreduce.am.resource.cpu-vcores=1
dfs.namenode.replication.work.multiplier.per.iteration=2
mapreduce.job.acl-modify-job= 
io.seqfile.local.dir=${hadoop.tmp.dir}/io/local
fs.s3.sleepTimeSeconds=10
mapreduce.client.output.filter=FAILED
------------------------

Sqoop command arguments :
             import
             --connect
             jdbc:mysql://172.25.38.161/test
             --username
             root
             --password
             root
             -m
             1
             --table
             CONTACTS
             --target-dir
             hdfs://cldx-1414-1259:8020/sqoopOozieTest/oozieImport/
=================================================================

>>> Invoking Sqoop command line now >>>

27345 [main] WARN  org.apache.sqoop.tool.SqoopTool  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
27456 [main] INFO  org.apache.sqoop.Sqoop  - Running Sqoop version: 1.4.5-cdh5.2.0
27558 [main] WARN  org.apache.sqoop.tool.BaseSqoopTool  - Setting your password on the command-line is insecure. Consider using -P instead.
27586 [main] WARN  org.apache.sqoop.ConnFactory  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
27941 [main] INFO  org.apache.sqoop.manager.SqlManager  - Using default fetchSize of 1000
27941 [main] INFO  org.apache.sqoop.tool.CodeGenTool  - Beginning code generation
28601 [main] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM `CONTACTS` AS t LIMIT 1
28666 [main] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM `CONTACTS` AS t LIMIT 1
28670 [main] INFO  org.apache.sqoop.orm.CompilationManager  - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop-mapreduce
42076 [main] INFO  org.apache.sqoop.orm.CompilationManager  - Writing jar file: /tmp/sqoop-yarn/compile/61d055b020bc7e036b6f8a30ac787f18/CONTACTS.jar
42088 [main] WARN  org.apache.sqoop.manager.MySQLManager  - It looks like you are importing from mysql.
42089 [main] WARN  org.apache.sqoop.manager.MySQLManager  - This transfer can be faster! Use the --direct
42089 [main] WARN  org.apache.sqoop.manager.MySQLManager  - option to exercise a MySQL-specific fast path.
42089 [main] INFO  org.apache.sqoop.manager.MySQLManager  - Setting zero DATETIME behavior to convertToNull (mysql)
42100 [main] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Beginning import of CONTACTS
42472 [main] WARN  org.apache.sqoop.mapreduce.JobBase  - SQOOP_HOME is unset. May not be able to find all job dependencies.
45670 [main] INFO  org.apache.sqoop.mapreduce.db.DBInputFormat  - Using read commited transaction isolation
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat

 

Approach 3:

 

-->oozie-sqoop_action.properties:
nameNode=hdfs://cldx-1414-1259:8020
jobTracker=cldx-1414-1259:8032

user.name=oozie

oozie.use.system.libpath=true

#import-dirSqoop=${nameNode}/sqoop/oozieImport/

sqoop_command=import --connect jdbc:mysql://172.25.38.161/test --username root --password root -m 1 --table CONTACTS --target-dir hdfs://cldx-1414-1259:8020/sqoopOozieTest/oozieImport/

oozie.wf.application.path=${nameNode}/user/oozie/oozie-sqoop_action.xml

-->oozie-sqoop_action.xml:
same as above

-->log:
Showing 4096 bytes. Click here for full log
che-size=10000
mapreduce.job.hdfs-servers=${fs.defaultFS}
yarn.application.classpath=$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
dfs.datanode.hdfs-blocks-metadata.enabled=true
mapreduce.tasktracker.dns.nameserver=default
dfs.datanode.readahead.bytes=4193404
mapreduce.job.ubertask.maxreduces=1
dfs.image.compress=false
mapreduce.shuffle.ssl.enabled=false
yarn.log-aggregation-enable=false
mapreduce.tasktracker.report.address=127.0.0.1:0
mapreduce.tasktracker.http.threads=40
dfs.stream-buffer-size=4096
tfile.fs.output.buffer.size=262144
fs.permissions.umask-mode=022
dfs.client.datanode-restart.timeout=30
yarn.resourcemanager.am.max-attempts=2
ha.failover-controller.graceful-fence.connection.retries=1
hadoop.proxyuser.hdfs.groups=*
dfs.datanode.drop.cache.behind.writes=false
hadoop.proxyuser.HTTP.hosts=*
hadoop.common.configuration.version=0.23.0
mapreduce.job.ubertask.enable=false
yarn.app.mapreduce.am.resource.cpu-vcores=1
dfs.namenode.replication.work.multiplier.per.iteration=2
mapreduce.job.acl-modify-job= 
io.seqfile.local.dir=${hadoop.tmp.dir}/io/local
fs.s3.sleepTimeSeconds=10
mapreduce.client.output.filter=FAILED
------------------------

Sqoop command arguments :
             import
             --connect
             jdbc:mysql://172.25.38.161/test
             --username
             root
             --password
             root
             -m
             1
             --table
             CONTACTS
             --target-dir
             hdfs://cldx-1414-1259:8020/sqoopOozieTest/oozieImport/
=================================================================

>>> Invoking Sqoop command line now >>>

50298 [main] WARN  org.apache.sqoop.tool.SqoopTool  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
50938 [main] INFO  org.apache.sqoop.Sqoop  - Running Sqoop version: 1.4.5-cdh5.2.0
51098 [main] WARN  org.apache.sqoop.tool.BaseSqoopTool  - Setting your password on the command-line is insecure. Consider using -P instead.
51221 [main] WARN  org.apache.sqoop.ConnFactory  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
53116 [main] INFO  org.apache.sqoop.manager.SqlManager  - Using default fetchSize of 1000
53117 [main] INFO  org.apache.sqoop.tool.CodeGenTool  - Beginning code generation
55470 [main] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM `CONTACTS` AS t LIMIT 1
55579 [main] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM `CONTACTS` AS t LIMIT 1
55589 [main] INFO  org.apache.sqoop.orm.CompilationManager  - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop-mapreduce
Heart beat
77776 [main] INFO  org.apache.sqoop.orm.CompilationManager  - Writing jar file: /tmp/sqoop-yarn/compile/1c3dc9b68c0dae79669a15ee8b86da92/CONTACTS.jar
77792 [main] WARN  org.apache.sqoop.manager.MySQLManager  - It looks like you are importing from mysql.
77793 [main] WARN  org.apache.sqoop.manager.MySQLManager  - This transfer can be faster! Use the --direct
77793 [main] WARN  org.apache.sqoop.manager.MySQLManager  - option to exercise a MySQL-specific fast path.
77794 [main] INFO  org.apache.sqoop.manager.MySQLManager  - Setting zero DATETIME behavior to convertToNull (mysql)
77815 [main] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Beginning import of CONTACTS
77960 [main] WARN  org.apache.sqoop.mapreduce.JobBase  - SQOOP_HOME is unset. May not be able to find all job dependencies.
83892 [main] INFO  org.apache.sqoop.mapreduce.db.DBInputFormat  - Using read commited transaction isolation
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat

 

 

HDFS browsed dir:

 

oozie-sqoop-issue_1.PNG

 

 

oozie-sqoop-issue_2.PNG

 

 

 

Required jars for oozie and sqoop is kept in /user/hdfs with share lib created as well.

The dir structure for /user/hdfs and /user/oozie is same.

 

oozie-sqoop-issue_3-hdfs dir.PNG

 

oozie-sqoop-issue_4-hdfs dir.PNG

 

oozie-sqoop-issue_5-hdfs dir.PNG

 

 

2 ACCEPTED SOLUTIONS

avatar
New Contributor

Hello all,

I solved the issue by increasing cores and adding one more datanode & node manager.

 

Thanks.

View solution in original post

avatar
Contributor

Increasing the container memory to 8GB within YARN solved the issue.

 

In the section: Resource Manager Default Group -> Resource Management

 

Configure: Container Memory Maximum to 8GB.

View solution in original post

13 REPLIES 13

avatar
New Contributor

Configuration of yarn and hdfs

 

yarn-site.xml

 

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>yarn.acl.enable</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.admin.acl</name>
    <value>*</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>cldx-1414-1259:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>cldx-1414-1259:8033</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>cldx-1414-1259:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>cldx-1414-1259:8031</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>cldx-1414-1259:8088</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.https.address</name>
    <value>cldx-1414-1259:8090</value>
  </property>
  <property>
    <name>yarn.resourcemanager.client.thread-count</name>
    <value>50</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.client.thread-count</name>
    <value>50</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.client.thread-count</name>
    <value>1</value>
  </property>
  <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>1024</value>
  </property>
  <property>
    <name>yarn.scheduler.increment-allocation-mb</name>
    <value>512</value>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>2475</value>
  </property>
  <property>
    <name>yarn.scheduler.minimum-allocation-vcores</name>
    <value>1</value>
  </property>
  <property>
    <name>yarn.scheduler.increment-allocation-vcores</name>
    <value>1</value>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>4</value>
  </property>
  <property>
    <name>yarn.resourcemanager.amliveliness-monitor.interval-ms</name>
    <value>1000</value>
  </property>
  <property>
    <name>yarn.am.liveness-monitor.expiry-interval-ms</name>
    <value>600000</value>
  </property>
  <property>
    <name>yarn.resourcemanager.am.max-attempts</name>
    <value>2</value>
  </property>
  <property>
    <name>yarn.resourcemanager.container.liveness-monitor.interval-ms</name>
    <value>600000</value>
  </property>
  <property>
    <name>yarn.resourcemanager.nm.liveness-monitor.interval-ms</name>
    <value>1000</value>
  </property>
  <property>
    <name>yarn.nm.liveness-monitor.expiry-interval-ms</name>
    <value>600000</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.client.thread-count</name>
    <value>50</value>
  </property>
  <property>
    <name>yarn.application.classpath</name>
    <value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
  </property>
  <property>
    <name>yarn.scheduler.fair.user-as-default-queue</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.scheduler.fair.preemption</name>
    <value>false</value>
  </property>
  <property>
    <name>yarn.scheduler.fair.sizebasedweight</name>
    <value>false</value>
  </property>
  <property>
    <name>yarn.scheduler.fair.assignmultiple</name>
    <value>false</value>
  </property>
  <property>
    <name>yarn.resourcemanager.max-completed-applications</name>
    <value>10000</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value></value>
  </property>
</configuration>

 

 

mapred-site.xml

 

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>mapreduce.job.split.metainfo.maxsize</name>
    <value>10000000</value>
  </property>
  <property>
    <name>mapreduce.job.counters.max</name>
    <value>120</value>
  </property>
  <property>
    <name>mapreduce.output.fileoutputformat.compress</name>
    <value>false</value>
  </property>
  <property>
    <name>mapreduce.output.fileoutputformat.compress.type</name>
    <value>BLOCK</value>
  </property>
  <property>
    <name>mapreduce.output.fileoutputformat.compress.codec</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec</value>
  </property>
  <property>
    <name>mapreduce.map.output.compress.codec</name>
    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
  </property>
  <property>
    <name>mapreduce.map.output.compress</name>
    <value>true</value>
  </property>
  <property>
    <name>zlib.compress.level</name>
    <value>DEFAULT_COMPRESSION</value>
  </property>
  <property>
    <name>mapreduce.task.io.sort.factor</name>
    <value>64</value>
  </property>
  <property>
    <name>mapreduce.map.sort.spill.percent</name>
    <value>0.8</value>
  </property>
  <property>
    <name>mapreduce.reduce.shuffle.parallelcopies</name>
    <value>10</value>
  </property>
  <property>
    <name>mapreduce.task.timeout</name>
    <value>600000</value>
  </property>
  <property>
    <name>mapreduce.client.submit.file.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>mapreduce.job.reduces</name>
    <value>2</value>
  </property>
  <property>
    <name>mapreduce.task.io.sort.mb</name>
    <value>256</value>
  </property>
  <property>
    <name>mapreduce.map.speculative</name>
    <value>false</value>
  </property>
  <property>
    <name>mapreduce.reduce.speculative</name>
    <value>false</value>
  </property>
  <property>
    <name>mapreduce.job.reduce.slowstart.completedmaps</name>
    <value>0.8</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.address</name>
    <value>cldx-1414-1259:10020</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>cldx-1414-1259:19888</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.webapp.https.address</name>
    <value>cldx-1414-1259:19890</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.admin.address</name>
    <value>cldx-1414-1259:10033</value>
  </property>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <name>yarn.app.mapreduce.am.staging-dir</name>
    <value>/user</value>
  </property>
  <property>
    <name>yarn.app.mapreduce.am.resource.mb</name>
    <value>1024</value>
  </property>
  <property>
    <name>yarn.app.mapreduce.am.resource.cpu-vcores</name>
    <value>1</value>
  </property>
  <property>
    <name>mapreduce.job.ubertask.enable</name>
    <value>false</value>
  </property>
  <property>
    <name>yarn.app.mapreduce.am.command-opts</name>
    <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value>
  </property>
  <property>
    <name>mapreduce.map.java.opts</name>
    <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value>
  </property>
  <property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value>
  </property>
  <property>
    <name>mapreduce.map.memory.mb</name>
    <value>1024</value>
  </property>
  <property>
    <name>mapreduce.map.cpu.vcores</name>
    <value>1</value>
  </property>
  <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>1024</value>
  </property>
  <property>
    <name>mapreduce.reduce.cpu.vcores</name>
    <value>1</value>
  </property>
  <property>
    <name>mapreduce.application.classpath</name>
    <value>$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$MR2_CLASSPATH</value>
  </property>
  <property>
    <name>mapreduce.admin.user.env</name>
    <value>LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native:$JAVA_LIBRARY_PATH</value>
  </property>
  <property>
    <name>mapreduce.shuffle.max.connections</name>
    <value>80</value>
  </property>
</configuration>

 

 

hdfs-site.xml

 

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///dfs/nn</value>
  </property>
  <property>
    <name>dfs.namenode.servicerpc-address</name>
    <value>cldx-1414-1259:8022</value>
  </property>
  <property>
    <name>dfs.https.address</name>
    <value>cldx-1414-1259:50470</value>
  </property>
  <property>
    <name>dfs.https.port</name>
    <value>50470</value>
  </property>
  <property>
    <name>dfs.namenode.http-address</name>
    <value>cldx-1414-1259:50070</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.blocksize</name>
    <value>134217728</value>
  </property>
  <property>
    <name>dfs.client.use.datanode.hostname</name>
    <value>false</value>
  </property>
  <property>
    <name>fs.permissions.umask-mode</name>
    <value>022</value>
  </property>
  <property>
    <name>dfs.namenode.acls.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.read.shortcircuit</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.domain.socket.path</name>
    <value>/var/run/hdfs-sockets/dn</value>
  </property>
  <property>
    <name>dfs.client.read.shortcircuit.skip.checksum</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.domain.socket.data.traffic</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
    <value>true</value>
  </property>
</configuration>

 

 

core-site.xml

 

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://cldx-1414-1259:8020</value>
  </property>
  <property>
    <name>fs.trash.interval</name>
    <value>1</value>
  </property>
  <property>
    <name>io.compression.codecs</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec</value>
  </property>
  <property>
    <name>hadoop.security.authentication</name>
    <value>simple</value>
  </property>
  <property>
    <name>hadoop.security.authorization</name>
    <value>false</value>
  </property>
  <property>
    <name>hadoop.rpc.protection</name>
    <value>authentication</value>
  </property>
  <property>
    <name>hadoop.security.auth_to_local</name>
    <value>DEFAULT</value>
  </property>
  <property>
    <name>hadoop.proxyuser.oozie.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.oozie.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.mapred.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.mapred.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.flume.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.flume.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.HTTP.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.HTTP.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hue.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hue.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.httpfs.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.httpfs.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hdfs.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hdfs.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.security.group.mapping</name>
    <value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value>
  </property>
  <property>
    <name>hadoop.security.instrumentation.requires.admin</name>
    <value>false</value>
  </property>
  <property>
    <name>net.topology.script.file.name</name>
    <value>/etc/hadoop/conf.cloudera.yarn/topology.py</value>
  </property>
  <property>
    <name>io.file.buffer.size</name>
    <value>65536</value>
  </property>
  <property>
    <name>hadoop.ssl.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>hadoop.ssl.require.client.cert</name>
    <value>false</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.keystores.factory.class</name>
    <value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.server.conf</name>
    <value>ssl-server.xml</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.client.conf</name>
    <value>ssl-client.xml</value>
    <final>true</final>
  </property>
</configuration>

 

 

ssl-client.xml

 

 

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>ssl.client.truststore.type</name>
    <value>jks</value>
  </property>
  <property>
    <name>ssl.client.truststore.reload.interval</name>
    <value>10000</value>
  </property>
</configuration>

 

Cluster disk info:

 

bash-4.1$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              54G   20G   32G  39% /
tmpfs                 7.8G     0  7.8G   0% /dev/shm

 

Cluster RAM info:

 

             total       used       free     shared    buffers     cached
Mem:            15          9          5          0          0          3
-/+ buffers/cache:          5          9
Swap:            3          0          3

 

Cluster CPU info:

 

bash-4.1$ lsb_release -a
LSB Version:    :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description:    Red Hat Enterprise Linux Server release 6.2 (Santiago)
Release:        6.2
Codename:       Santiago
bash-4.1$
bash-4.1$ uname  -a
Linux cldx-1414-1259 2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9 08:03:13 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

 

 Please let me know what changes has to be done (in configuration or any other place).

 

 

 

 

 

avatar
New Contributor

Hello all,

I solved the issue by increasing cores and adding one more datanode & node manager.

 

Thanks.

avatar
Explorer
Hi BigdataFunda,
I Have same Probleme, so i work with cloudera quickstart 5.10 .
i would now, whzt do you mean increase cores !! increase vcores in file conf yearn.xml !!!
thank you !!

avatar
New Contributor

I tried below solution it works perfectly for me.

1) Change the Hadoop schedule type from capacity scheduler to fair scheduler. Because for small cluster each queue assign some memory size (2048MB) to complete single map reduce job. If more than one map reduce job run in single queue mean it met deadlock.

Solution: add below property to yarn-site.xml

  <property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
  </property>
  <property>
    <name>yarn.scheduler.fair.allocation.file</name>
    <value>file:/%HADOOP_HOME%/etc/hadoop/fair-scheduler.xml</value>
  </property>

2) By default Hadoop Total memory size was allot as 8GB.

So if we run two mapreduce program memory used by Hadoop get more than 8GB so it met deadlock.

Solution: Increase the size of Total Memory of nodemanager using following properties at yarn-site.xml

<property>
  <name>yarn.nodemanager.resource.memory-mb</name>
  <value>20960</value>
  </property>
  <property>
  <name>yarn.scheduler.minimum-allocation-mb</name>
  <value>1024</value>
  </property>
  <property>
  <name>yarn.scheduler.maximum-allocation-mb</name>
  <value>2048</value>
  </property>

So If user try to run more than two mapreduce program mean he need to increase nodemanager or he need to increase the size of total memory of Hadoop (note: Increasing the size will reduce the system usage memory. Above property file able to run 10 map reduce program concurrently.)

avatar
Contributor

Folks:

 

I have the same issue while I am running the cluster in pseudo distributed mode. Can someone please share about how to resolve this issue? 

 

Thanks,

avatar
Contributor

Increasing the container memory to 8GB within YARN solved the issue.

 

In the section: Resource Manager Default Group -> Resource Management

 

Configure: Container Memory Maximum to 8GB.

avatar
New Contributor

Hi,

 

I am also facing similar problem in CDH 5.3 in psuedo distributed mode. Please let me know if you were able to find any solution?

 

Apache Pig version 0.12.0-cdh5.3.0 (rexported) 
compiled Dec 16 2014, 19:05:55

Run pig script using PigRunner.run() for Pig version 0.8+
2015-05-23 09:28:23,905 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.3.0 (rexported) compiled Dec 16 2014, 19:05:55
2015-05-23 09:28:23,909 [main] INFO org.apache.pig.Main - Logging error messages to: /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/cloudera/appcache/application_1432393487863_0...
2015-05-23 09:28:24,043 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /var/lib/hadoop-yarn/.pigbootup not found
2015-05-23 09:28:24,199 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-23 09:28:24,200 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-23 09:28:24,200 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://quickstart.cloudera:8020
2015-05-23 09:28:24,207 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8032
2015-05-23 09:28:25,668 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2015-05-23 09:28:25,877 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2015-05-23 09:28:26,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2015-05-23 09:28:26,255 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2015-05-23 09:28:26,255 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2015-05-23 09:28:26,623 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at localhost/127.0.0.1:8032
2015-05-23 09:28:26,972 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2015-05-23 09:28:27,071 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2015-05-23 09:28:27,071 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-05-23 09:28:27,071 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2015-05-23 09:28:27,079 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job2635559100793052600.jar
2015-05-23 09:28:34,692 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job2635559100793052600.jar created
2015-05-23 09:28:34,692 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2015-05-23 09:28:34,737 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-05-23 09:28:34,826 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2015-05-23 09:28:34,874 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2015-05-23 09:28:34,920 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2015-05-23 09:28:34,997 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-05-23 09:28:35,086 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2015-05-23 09:28:35,086 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-23 09:28:35,114 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at localhost/127.0.0.1:8032
2015-05-23 09:28:35,359 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-23 09:28:44,197 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-05-23 09:28:44,211 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-05-23 09:28:45,222 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2015-05-23 09:28:47,256 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
Heart beat
2015-05-23 09:28:52,256 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1432393487863_0004
2015-05-23 09:28:52,256 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Kind: mapreduce.job, Service: job_1432393487863_0003, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@3a335d31)
2015-05-23 09:28:52,674 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Kind: RM_DELEGATION_TOKEN, Service: 127.0.0.1:8032, Ident: (owner=cloudera, renewer=oozie mr token, realUser=oozie, issueDate=1432397362327, maxDate=1433002162327, sequenceNumber=10, masterKeyId=2)
2015-05-23 09:29:19,549 [JobControl] WARN org.apache.hadoop.mapreduce.v2.util.MRApps - cache file (mapreduce.job.cache.files) hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20141218070949/pig/json-simple-1.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20141218070949/oozie/json-simple-1.1.jar This will be an error in Hadoop 2.0
Heart beat
2015-05-23 09:29:27,684 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1432393487863_0004
2015-05-23 09:29:29,035 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1432393487863_0004/
2015-05-23 09:29:29,036 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1432393487863_0004
2015-05-23 09:29:29,036 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases mydata
2015-05-23 09:29:29,036 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: mydata[1,9],mydata[-1,-1] C: R:
2015-05-23 09:29:29,036 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_1432393487863_0004
2015-05-23 09:29:31,194 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat

avatar
Contributor
Hi,

Can you please confirm your physical RAM in your machine that is hosting the pseudo distributed mode?

I had similar issue when I moved from 5.2 to 5.3. All I did was that I went with the default values especially the ones in the resource manager with respect to virtual cores available and the memory available for the containers.

I hope this will help. If not, please post the information sought and I will try my best to help as much as I can.

Thanks,
Kabeer.

avatar
New Contributor
Hi,
Physical RAM on my system is max 4GB and I am not using cloudera Manager.
Interesting thing is that if I try to execute the script from grunt shell it works well and if I try to execute same via pig editor in hue then it freezes with heart beat.
Please let me know if I need to change or specify some additional properties in yarn-site.xml. Presently I am running the system with all the default properties.
Thanks,vibhor