Support Questions

Find answers, ask questions, and share your expertise

Sqoop jobs are not running

avatar
Rising Star

I am trying to run sqoop job but it is getting stuck without throwing an error. I am unable to see any yarn logs from this sqoop job.

What can I do to identify the issue here.

Last part of the Log lookslike below:

16/09/13 05:05:35 INFO db.DBInputFormat: Using read commited transaction isolation 16/09/13 05:05:35 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '1=1' and upper bound '1=1' 16/09/13 05:05:35 INFO mapreduce.JobSubmitter: number of splits:1 16/09/13 05:05:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1473064502809_0029 16/09/13 05:05:36 INFO impl.YarnClientImpl: Submitted application application_1473064502809_0029 16/09/13 05:05:36 INFO mapreduce.Job: The url to track the job: http://**********/proxy/application_1473064502809_0029/ 16/09/13 05:05:36 INFO mapreduce.Job: Running job: job_1473064502809_0029

1 ACCEPTED SOLUTION

avatar
Super Guru

@Gaurab D

Can you please share logs for the job? There should be more info you can find either in Ambari or just look under /var/log/sqoop folder or may be /var/log/yarn

View solution in original post

11 REPLIES 11

avatar
Super Guru

@Gaurab D

Can you please share logs for the job? There should be more info you can find either in Ambari or just look under /var/log/sqoop folder or may be /var/log/yarn

avatar
Rising Star

Below is the log from /var/log/yarn folder immediately after submitting the job.

2016-09-14 05:41:51,852 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService (IPC Server handler 30 on 8032): Allocated new applicationId: 34 2016-09-14 05:41:53,826 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService (IPC Server handler 30 on 8032): Application with id 34 submitted by user hadoop 2016-09-14 05:41:53,826 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl (AsyncDispatcher event handler): Storing application with id application_1473064502809_0034 2016-09-14 05:41:53,826 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger (IPC Server handler 30 on 8032): USER=hadoop IP=********* OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1473064502809_0034 2016-09-14 05:41:53,826 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl (AsyncDispatcher event handler): application_1473064502809_0034 State change from NEW to NEW_SAVING 2016-09-14 05:41:53,826 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore (AsyncDispatcher event handler): Storing info for app: application_1473064502809_0034 2016-09-14 05:41:53,826 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl (AsyncDispatcher event handler): application_1473064502809_0034 State change from NEW_SAVING to SUBMITTED 2016-09-14 05:41:53,826 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue (ResourceManager Event Processor): Application added - appId: application_1473064502809_0034 user: hadoop leaf-queue of parent: root #applications: 10 2016-09-14 05:41:53,826 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler (ResourceManager Event Processor): Accepted application application_1473064502809_0034 from user: hadoop, in queue: default 2016-09-14 05:41:53,827 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl (AsyncDispatcher event handler): appattempt_1473064502809_0034_000001 amEmrLabels: CORE 2016-09-14 05:41:53,827 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl (AsyncDispatcher event handler): application_1473064502809_0034 State change from SUBMITTED to ACCEPTED 2016-09-14 05:41:53,827 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService (AsyncDispatcher event handler): Registering app attempt : appattempt_1473064502809_0034_000001 2016-09-14 05:41:53,827 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl (AsyncDispatcher event handler): appattempt_1473064502809_0034_000001 State change from NEW to SUBMITTED 2016-09-14 05:41:53,827 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue (ResourceManager Event Processor): Application application_1473064502809_0034 from user: hadoop activated in queue: default 2016-09-14 05:41:53,827 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue (ResourceManager Event Processor): Application added - appId: application_1473064502809_0034 user: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$User@48d9e852, leaf-queue: default #user-pending-applications: 0 #user-active-applications: 10 #queue-pending-applications: 0 #queue-active-applications: 10 2016-09-14 05:41:53,827 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler (ResourceManager Event Processor): Added Application Attempt appattempt_1473064502809_0034_000001 to scheduler from user hadoop in queue default 2016-09-14 05:41:53,828 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl (AsyncDispatcher event handler): appattempt_1473064502809_0034_000001 State change from SUBMITTED to SCHEDULED

avatar
Super Guru

It says submitted to scheduled. What else do you have running on cluster? It appears that the job is in queue. Can you check what you have allocated to your YARN queue and the credentials for the user running the job(likely sqoop). Does this user have enough resources allocated to run yarn jobs. Seems like a queue issue.

avatar
Rising Star

Thanks @mqureshi , It was indeed a queue issue. I have earlier configured capacity scheduler giving where both the datanodes are associated with one queue which I wasn't using while submitting this job. Once I removed that configuration, job is running just fine. Many thanks for your advice.

avatar

To get logs of your job, the easiest way may be:

yarn logs -applicationId application_1473064502809_0029

This should give you more information. If your job has not started, it may be a resource issue on your queue.

avatar
Rising Star

Whenever I am trying to check logs using the above command. It is showing below message.

"/var/log/yarn/apps/hadoop/logs/application_1473064502809_0029 does not exist

Log aggregation has not completed or is not enabled."

avatar

Then it looks like your job did not start at all. I think the job is waiting for resources in your queue or something similar. If you have the possibility, I'd go in the YARN resource manager UI to check what's going on.

avatar
Rising Star

My guess this is due to job is only in not in SCHEDULED state but RUNNING state.

avatar

If in the YARN UI, you see your job as running, then you can go on your job through the UI, and check individual container logs (or directly go on the nodes where the containers are running and check logs manually for each container). The yarn logs command is only possible once your job has finished. If your job is running from an a very long time and that is not expected, you can consider killing it and use the logs command again.