Created on 12-01-2014 02:28 AM - edited 09-16-2022 02:14 AM
We have one node cluster of CHD 5.2 with following services running on it.
HDFS
Hive
Impala
Oozie
Sqoop 1 Client
YARN (MR2 Included)
ZooKeeper
Task to perform:
1. We want to connect to MySQL DB and fetch the data in HDFS using sqoop (independently using console)
2. Above mentioned task has to be triggered using Oozie.
We were succesful in doing first task i.e by running the sqoop job independently on console.
Problem statement: Sqoop action in Oozie gives heart beat in log for long time and oozie job/ sqoop map reduce job is in running state for long time.
Scenario of cluster:
1. All services are up and running (tested individual service)
2. No application running. (no mapreduce job running)
But with the task 2, no output is generated.
Oozie job is in running state for long time. Also the Yarn application summary shows 1 Apps pending, 1 Apps Running for long time.
Approach 1:
-->oozie-sqoop_action.properties: nameNode=hdfs://cldx-1414-1259:8020 jobTracker=cldx-1414-1259:8032 queueName=default user.name=hdfs oozie.use.system.libpath=true outputDirPath=hdfs://cldx-1414-1259:8020/sqoopTest27Nov_clusterRestored/oozieImport/ sqoop_command=import --connect jdbc:mysql://172.25.38.161/test --username root --password root -m 1 --table CONTACTS --target-dir hdfs://cldx-1414-1259:8020/sqoopTest27Nov_clusterRestored/oozieImport/ oozie.wf.application.path=${nameNode}/user/hdfs/oozie-sqoop_action.xml -->oozie-sqoop_action.xml: <workflow-app xmlns='uri:oozie:workflow:0.1' name='Sqoop Action XML'> <start to='SqoopAction' /> <action name="SqoopAction"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${outputDirPath}"/> </prepare> <configuration> <property> <name>mapred.compress.map.output</name> <value>true</value> </property> </configuration> <command>${sqoop_command}</command> </sqoop> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>Sqoop failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name='end' /> </workflow-app> -->log: Container log: 2014-11-28 23:55:42,345 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1417108420351_0024_000001 2014-11-28 23:55:45,081 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.require.client.cert; Ignoring. 2014-11-28 23:55:45,090 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 2014-11-28 23:55:45,092 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.client.conf; Ignoring. 2014-11-28 23:55:45,099 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.keystores.factory.class; Ignoring. 2014-11-28 23:55:45,109 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.server.conf; Ignoring. 2014-11-28 23:55:45,148 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 2014-11-28 23:55:45,670 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens: 2014-11-28 23:55:45,670 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@3adde4f2) 2014-11-28 23:55:46,136 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: RM_DELEGATION_TOKEN, Service: 172.25.7.91:8032, Ident: (owner=hdfs, renewer=oozie mr token, realUser=oozie, issueDate=1417199124140, maxDate=1417803924140, sequenceNumber=101, masterKeyId=3) 2014-11-28 23:55:46,773 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.require.client.cert; Ignoring. 2014-11-28 23:55:46,779 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 2014-11-28 23:55:46,781 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.client.conf; Ignoring. 2014-11-28 23:55:46,783 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.keystores.factory.class; Ignoring. 2014-11-28 23:55:46,790 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.server.conf; Ignoring. 2014-11-28 23:55:46,815 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 2014-11-28 23:55:50,206 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config null 2014-11-28 23:55:50,211 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter 2014-11-28 23:55:50,522 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler 2014-11-28 23:55:50,532 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher 2014-11-28 23:55:50,547 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher 2014-11-28 23:55:50,566 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher 2014-11-28 23:55:50,573 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler 2014-11-28 23:55:50,588 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher 2014-11-28 23:55:50,602 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter 2014-11-28 23:55:50,615 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter 2014-11-28 23:55:51,853 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler 2014-11-28 23:55:55,752 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2014-11-28 23:55:56,563 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2014-11-28 23:55:56,569 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics system started 2014-11-28 23:55:56,733 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Adding job token for job_1417108420351_0024 to jobTokenSecretManager 2014-11-28 23:55:58,180 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Not uberizing job_1417108420351_0024 because: not enabled; 2014-11-28 23:55:58,434 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job job_1417108420351_0024 = 0. Number of splits = 1 2014-11-28 23:55:58,435 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Number of reduces for job job_1417108420351_0024 = 0 2014-11-28 23:55:58,436 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1417108420351_0024Job Transitioned from NEW to INITED 2014-11-28 23:55:58,447 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster launching normal, non-uberized, multi-container job job_1417108420351_0024. 2014-11-28 23:55:59,470 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue 2014-11-28 23:55:59,759 INFO [Socket Reader #1 for port 38852] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 38852 2014-11-28 23:56:00,011 INFO [main] org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB to the server 2014-11-28 23:56:00,063 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-11-28 23:56:00,070 INFO [IPC Server listener on 38852] org.apache.hadoop.ipc.Server: IPC Server listener on 38852: starting 2014-11-28 23:56:00,089 INFO [main] org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Instantiated MRClientService at cldx-1414-1259/172.25.7.91:38852 2014-11-28 23:56:01,011 INFO [main] org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2014-11-28 23:56:01,063 INFO [main] org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.mapreduce is not defined 2014-11-28 23:56:01,243 INFO [main] org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) 2014-11-28 23:56:01,299 INFO [main] org.apache.hadoop.http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context mapreduce 2014-11-28 23:56:01,300 INFO [main] org.apache.hadoop.http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context static 2014-11-28 23:56:01,341 INFO [main] org.apache.hadoop.http.HttpServer2: adding path spec: /mapreduce/* 2014-11-28 23:56:01,344 INFO [main] org.apache.hadoop.http.HttpServer2: adding path spec: /ws/* 2014-11-28 23:56:01,587 INFO [main] org.apache.hadoop.http.HttpServer2: Jetty bound to port 56301 2014-11-28 23:56:01,588 INFO [main] org.mortbay.log: jetty-6.1.26 2014-11-28 23:56:02,085 INFO [main] org.mortbay.log: Extract jar:file:/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/hadoop-yarn-common-2.5.0-cdh5.2.0.jar!/webapps/mapreduce to /tmp/Jetty_0_0_0_0_56301_mapreduce____.ai0e56/webapp 2014-11-28 23:56:07,915 INFO [main] org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:56301 2014-11-28 23:56:07,923 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Web app /mapreduce started at 56301 2014-11-28 23:56:10,911 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules 2014-11-28 23:56:10,928 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue 2014-11-28 23:56:10,953 INFO [Socket Reader #1 for port 40428] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 40428 2014-11-28 23:56:10,970 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-11-28 23:56:10,980 INFO [IPC Server listener on 40428] org.apache.hadoop.ipc.Server: IPC Server listener on 40428: starting 2014-11-28 23:56:11,055 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: nodeBlacklistingEnabled:true 2014-11-28 23:56:11,055 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: maxTaskFailuresPerNode is 3 2014-11-28 23:56:11,055 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: blacklistDisablePercent is 33 2014-11-28 23:56:11,311 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.require.client.cert; Ignoring. 2014-11-28 23:56:11,322 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 2014-11-28 23:56:11,323 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.client.conf; Ignoring. 2014-11-28 23:56:11,324 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.keystores.factory.class; Ignoring. 2014-11-28 23:56:11,326 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.ssl.server.conf; Ignoring. 2014-11-28 23:56:11,339 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 2014-11-28 23:56:11,371 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at cldx-1414-1259/172.25.7.91:8030 2014-11-28 23:56:11,715 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: maxContainerCapability: <memory:2475, vCores:4> 2014-11-28 23:56:11,715 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: queue: root.hdfs 2014-11-28 23:56:11,735 INFO [main] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Upper limit on the thread pool size is 500 2014-11-28 23:56:11,737 INFO [main] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: yarn.client.max-nodemanagers-proxies : 500 2014-11-28 23:56:11,810 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1417108420351_0024Job Transitioned from INITED to SETUP 2014-11-28 23:56:11,831 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_SETUP 2014-11-28 23:56:12,017 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1417108420351_0024Job Transitioned from SETUP to RUNNING 2014-11-28 23:56:12,163 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1417108420351_0024_m_000000 Task Transitioned from NEW to SCHEDULED 2014-11-28 23:56:12,208 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1417108420351_0024_m_000000_0 TaskAttempt Transitioned from NEW to UNASSIGNED 2014-11-28 23:56:12,236 INFO [Thread-51] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: mapResourceRequest:<memory:1024, vCores:1> 2014-11-28 23:56:12,650 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1417108420351_0024, File: hdfs://cldx-1414-1259:8020/user/hdfs/.staging/job_1417108420351_0024/job_1417108420351_0024_1.jhist 2014-11-28 23:56:12,739 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackLocal:0 2014-11-28 23:56:12,948 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1417108420351_0024: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:1451, vCores:3> knownNMs=1 2014-11-28 23:56:17,112 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1 2014-11-28 23:56:17,429 INFO [RMCommunicator Allocator] org.apache.hadoop.yarn.util.RackResolver: Resolved cldx-1414-1259 to /default 2014-11-28 23:56:17,446 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1417108420351_0024_01_000002 to attempt_1417108420351_0024_m_000000_0 2014-11-28 23:56:17,466 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0 2014-11-28 23:56:17,920 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved cldx-1414-1259 to /default 2014-11-28 23:56:17,924 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Job jar is not present. Not adding any jar to the list of resources. 2014-11-28 23:56:18,275 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: The job-conf file on the remote FS is /user/hdfs/.staging/job_1417108420351_0024/job.xml 2014-11-28 23:56:20,314 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/ST4-4.0.4.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/ST4-4.0.4.jar This will be an error in Hadoop 2.0 2014-11-28 23:56:20,336 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/activation-1.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/activation-1.1.jar This will be an error in Hadoop 2.0 org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/snappy-java-1.0.4.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/snappy-java-1.0.4.1.jar This will be an error in Hadoop 2.0 2014-11-28 23:56:21,288 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/sqoop-1.4.5-cdh5.2.0.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/sqoop-1.4.5-cdh5.2.0.jar This will be an error in Hadoop 2.0 2014-11-28 23:56:21,294 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/stringtemplate-3.2.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/stringtemplate-3.2.1.jar This will be an error in Hadoop 2.0 2014-11-28 23:56:21,297 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/xz-1.0.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/sqoop/xz-1.0.jar This will be an error in Hadoop 2.0 2014-11-28 23:56:21,300 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/json-simple-1.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/oozie/json-simple-1.1.jar This will be an error in Hadoop 2.0 2014-11-28 23:56:21,302 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/oozie-hadoop-utils-2.5.0-cdh5.2.0.oozie-4.0.0-cdh5.2.0.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/oozie/oozie-hadoop-utils-2.5.0-cdh5.2.0.oozie-4.0.0-cdh5.2.0.jar This will be an error in Hadoop 2.0 2014-11-28 23:56:21,305 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.util.MRApps: cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/hdfs/lib/oozie-sharelib-oozie-4.0.0-cdh5.2.0.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://cldx-1414-1259:8020/user/oozie/share/lib/lib_20141121170802/oozie/oozie-sharelib-oozie-4.0.0-cdh5.2.0.jar This will be an error in Hadoop 2.0 2014-11-28 23:56:21,306 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Adding #1 tokens and #1 secret keys for NM use for launching container 2014-11-28 23:56:21,306 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Size of containertokens_dob is 2 2014-11-28 23:56:21,306 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Putting shuffle token in serviceData 2014-11-28 23:56:22,565 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1417108420351_0024_m_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED 2014-11-28 23:56:22,573 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1417108420351_0024: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:427, vCores:2> knownNMs=1 2014-11-28 23:56:22,588 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1417108420351_0024_01_000002 taskAttempt attempt_1417108420351_0024_m_000000_0 2014-11-28 23:56:22,604 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1417108420351_0024_m_000000_0 2014-11-28 23:56:22,610 INFO [ContainerLauncher #0] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : cldx-1414-1259:8041 2014-11-28 23:56:23,297 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1417108420351_0024_m_000000_0 : 13562 2014-11-28 23:56:23,305 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1417108420351_0024_m_000000_0] using containerId: [container_1417108420351_0024_01_000002 on NM: [cldx-1414-1259:8041] 2014-11-28 23:56:23,314 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1417108420351_0024_m_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING 2014-11-28 23:56:23,323 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1417108420351_0024_m_000000 Task Transitioned from SCHEDULED to RUNNING 2014-11-28 23:56:28,300 INFO [Socket Reader #1 for port 40428] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1417108420351_0024 (auth:SIMPLE) 2014-11-28 23:56:28,341 INFO [IPC Server handler 0 on 40428] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1417108420351_0024_m_000002 asked for a task 2014-11-28 23:56:28,342 INFO [IPC Server handler 0 on 40428] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1417108420351_0024_m_000002 given task: attempt_1417108420351_0024_m_000000_0 2014-11-28 23:56:38,162 INFO [IPC Server handler 1 on 40428] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1417108420351_0024_m_000000_0 is : 1.0 2014-11-28 23:57:05,823 INFO [IPC Server handler 6 on 40428]
Approach 2:
-->oozie-sqoop_action.properties: nameNode=hdfs://cldx-1414-1259:8020 jobTracker=cldx-1414-1259:8032 user.name=oozie #oozie.use.system.libpath=true #import-dirSqoop=${nameNode}/sqoop/oozieImport/ sqoop_command=import --connect jdbc:mysql://172.25.38.161/test --username root --password root -m 1 --table CONTACTS --target-dir hdfs://cldx-1414-1259:8020/sqoopOozieTest/oozieImport/ oozie.wf.application.path=${nameNode}/user/oozie/oozie-sqoop_action.xml -->oozie-sqoop_action.xml: same as above -->log: Showing 4096 bytes. Click here for full log ad.bytes=4193404 mapreduce.job.ubertask.maxreduces=1 dfs.image.compress=false mapreduce.shuffle.ssl.enabled=false yarn.log-aggregation-enable=false mapreduce.tasktracker.report.address=127.0.0.1:0 mapreduce.tasktracker.http.threads=40 dfs.stream-buffer-size=4096 tfile.fs.output.buffer.size=262144 fs.permissions.umask-mode=022 dfs.client.datanode-restart.timeout=30 yarn.resourcemanager.am.max-attempts=2 ha.failover-controller.graceful-fence.connection.retries=1 hadoop.proxyuser.hdfs.groups=* dfs.datanode.drop.cache.behind.writes=false hadoop.proxyuser.HTTP.hosts=* hadoop.common.configuration.version=0.23.0 mapreduce.job.ubertask.enable=false yarn.app.mapreduce.am.resource.cpu-vcores=1 dfs.namenode.replication.work.multiplier.per.iteration=2 mapreduce.job.acl-modify-job= io.seqfile.local.dir=${hadoop.tmp.dir}/io/local fs.s3.sleepTimeSeconds=10 mapreduce.client.output.filter=FAILED ------------------------ Sqoop command arguments : import --connect jdbc:mysql://172.25.38.161/test --username root --password root -m 1 --table CONTACTS --target-dir hdfs://cldx-1414-1259:8020/sqoopOozieTest/oozieImport/ ================================================================= >>> Invoking Sqoop command line now >>> 27345 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 27456 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.5-cdh5.2.0 27558 [main] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead. 27586 [main] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 27941 [main] INFO org.apache.sqoop.manager.SqlManager - Using default fetchSize of 1000 27941 [main] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation 28601 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `CONTACTS` AS t LIMIT 1 28666 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `CONTACTS` AS t LIMIT 1 28670 [main] INFO org.apache.sqoop.orm.CompilationManager - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop-mapreduce 42076 [main] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-yarn/compile/61d055b020bc7e036b6f8a30ac787f18/CONTACTS.jar 42088 [main] WARN org.apache.sqoop.manager.MySQLManager - It looks like you are importing from mysql. 42089 [main] WARN org.apache.sqoop.manager.MySQLManager - This transfer can be faster! Use the --direct 42089 [main] WARN org.apache.sqoop.manager.MySQLManager - option to exercise a MySQL-specific fast path. 42089 [main] INFO org.apache.sqoop.manager.MySQLManager - Setting zero DATETIME behavior to convertToNull (mysql) 42100 [main] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of CONTACTS 42472 [main] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies. 45670 [main] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat
Approach 3:
-->oozie-sqoop_action.properties: nameNode=hdfs://cldx-1414-1259:8020 jobTracker=cldx-1414-1259:8032 user.name=oozie oozie.use.system.libpath=true #import-dirSqoop=${nameNode}/sqoop/oozieImport/ sqoop_command=import --connect jdbc:mysql://172.25.38.161/test --username root --password root -m 1 --table CONTACTS --target-dir hdfs://cldx-1414-1259:8020/sqoopOozieTest/oozieImport/ oozie.wf.application.path=${nameNode}/user/oozie/oozie-sqoop_action.xml -->oozie-sqoop_action.xml: same as above -->log: Showing 4096 bytes. Click here for full log che-size=10000 mapreduce.job.hdfs-servers=${fs.defaultFS} yarn.application.classpath=$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/* dfs.datanode.hdfs-blocks-metadata.enabled=true mapreduce.tasktracker.dns.nameserver=default dfs.datanode.readahead.bytes=4193404 mapreduce.job.ubertask.maxreduces=1 dfs.image.compress=false mapreduce.shuffle.ssl.enabled=false yarn.log-aggregation-enable=false mapreduce.tasktracker.report.address=127.0.0.1:0 mapreduce.tasktracker.http.threads=40 dfs.stream-buffer-size=4096 tfile.fs.output.buffer.size=262144 fs.permissions.umask-mode=022 dfs.client.datanode-restart.timeout=30 yarn.resourcemanager.am.max-attempts=2 ha.failover-controller.graceful-fence.connection.retries=1 hadoop.proxyuser.hdfs.groups=* dfs.datanode.drop.cache.behind.writes=false hadoop.proxyuser.HTTP.hosts=* hadoop.common.configuration.version=0.23.0 mapreduce.job.ubertask.enable=false yarn.app.mapreduce.am.resource.cpu-vcores=1 dfs.namenode.replication.work.multiplier.per.iteration=2 mapreduce.job.acl-modify-job= io.seqfile.local.dir=${hadoop.tmp.dir}/io/local fs.s3.sleepTimeSeconds=10 mapreduce.client.output.filter=FAILED ------------------------ Sqoop command arguments : import --connect jdbc:mysql://172.25.38.161/test --username root --password root -m 1 --table CONTACTS --target-dir hdfs://cldx-1414-1259:8020/sqoopOozieTest/oozieImport/ ================================================================= >>> Invoking Sqoop command line now >>> 50298 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 50938 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.5-cdh5.2.0 51098 [main] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead. 51221 [main] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 53116 [main] INFO org.apache.sqoop.manager.SqlManager - Using default fetchSize of 1000 53117 [main] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation 55470 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `CONTACTS` AS t LIMIT 1 55579 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `CONTACTS` AS t LIMIT 1 55589 [main] INFO org.apache.sqoop.orm.CompilationManager - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop-mapreduce Heart beat 77776 [main] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-yarn/compile/1c3dc9b68c0dae79669a15ee8b86da92/CONTACTS.jar 77792 [main] WARN org.apache.sqoop.manager.MySQLManager - It looks like you are importing from mysql. 77793 [main] WARN org.apache.sqoop.manager.MySQLManager - This transfer can be faster! Use the --direct 77793 [main] WARN org.apache.sqoop.manager.MySQLManager - option to exercise a MySQL-specific fast path. 77794 [main] INFO org.apache.sqoop.manager.MySQLManager - Setting zero DATETIME behavior to convertToNull (mysql) 77815 [main] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of CONTACTS 77960 [main] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies. 83892 [main] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat
HDFS browsed dir:
Required jars for oozie and sqoop is kept in /user/hdfs with share lib created as well.
The dir structure for /user/hdfs and /user/oozie is same.