Support Questions

Find answers, ask questions, and share your expertise

Running a Sqoop script in the background appears to 'hang' but works when running in the foreground

Explorer

Consdering the following bash script for Sqoop:

#!/bin/sh

connection_string="jdbc:sqlserver://remoteserver.somehwere.location-asp.com:1433;database=idistrict"

user_name="OPSUser"

db_password="OPSReader"

sqoop_cmd="list-databases"

sqoop $sqoop_cmd --connect $connection_string --username $user_name --password $db_password

We can run this just fine in the foreground, i.e.:

./sqoop_test.sh

But running it in the background like so:

./sqoop_test.sh &

The script appears to 'hang' when kicking off the actual sqoop command...i.e. nothing happens at all.

Using -x on the #!/bin/sh line shows that we end up at the last line of the script and then nothing...

We have tried all kinds of iterations of different commands like:

nohup bash sqoop.sh > results.txt 2>&1 &

./sqoop.sh &> /dev/null &

switched to #!/bin/bash

Any ideas? The odd thing is that the same exact script works fine both foregrounded and backgrounded on a different cluster. /etc/profile, and .bash_profile don't look to have any major differences.

1 ACCEPTED SOLUTION

Explorer

@Alex Miller I was able to reproduce and personally had luck with screen as well as placing the "&" inside my test script itself at the end of the sqoop command rather than trying to background the script at invocation time (i.e. ./sqoop.sh &).

The /dev/null thing was also successful for me as well with Accumulo in place.

The customer apparently had gone ahead and removed the Accumulo bits before they had a chance to test my suggestions since any further they weren't using it, anyway.

So I really think there isn't a bug and we are hitting some bash-isms here more than anything else.

Thanks, all, for the tips.

View solution in original post

13 REPLIES 13

@Kent Baxley

Is this cluster setup with queues?

Can you check if sqoop is waiting on other jobs to finish ?

Explorer

@Neeraj - No other jobs are waiting to finish and we can run this pretty much at-will in the foreground without things getting seemingly stuck.

Mentor

add "`whoami'" and "`hostname`" to the script, see what prints out. I'd also add "2>&1 | tee -a log", i.e. redirect the output of the console to a file to see the output in foreground and background. It should give you some insight to what's happening. What specifically is the reason to running it in the background @Kent Baxley?

Mentor

check limits on the user in the background and in the foreground. There may be an OS limit on background processes.

I would recommend using screen. It gives you all of the benefits of a background job will all of the benefits of a job running in the foreground as well: https://wiki.archlinux.org/index.php/GNU_Screen

Explorer

@Artem Ervits The reason behind the backgrounding is there are quite a few tables with 2+ million records and they would like to run start a sqoop job in the background after hours (there has to be a better way to do this, in my opinion).

Foregrounded:

-bash-4.1$ ./sample_sqoop.sh

whoami sboddu

hostname node2.example.com

2015-11-05 19:15:03,027 INFO - [main:] ~ Running Sqoop version: 1.4.6.2.3.0.0-2557 (Sqoop:92) 2015-11-05 19:15:03,043 WARN - [main:] ~ Setting your password on the command-line is insecure. Consider using -P instead. (BaseSqoopTool:1021)

2015-11-05 19:15:03,248 INFO - [main:] ~ Using default fetchSize of 1000 (SqlManager:98) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.3.0.0-2557/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.3.0.0-2557/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.3.0.0-2557/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

master

tempdb

model msdb

ReportServer

Idistrict_Distributer

idistrict

iDistrict_Audit

iDistrict_SlimDB

iDistrict_Reports

ReportServerTempDB

Idistrict_Replication

iDistrict_Attachment FTL

Backgrounded:

-bash-4.1$ ./sample_sqoop.sh& [2] 33320 -bash-4.1$

whoami sboddu

hostname node2.example.com

[2]+ Stopped ./sample_sqoop.sh

Mentor

you're sqooping master, model, reportdb, not sure if you need to do that, I would limit the tables just to the ones you need. Other than that, please check ulimit on the user executing the job in foreground and background, http://www.commandlinefu.com/commands/view/9893/find-ulimit-values-of-currently-running-process.

Explorer

@Artem Ervits Turns out that the factor was having the Accumulo Client installed on the machine alongside sqoop.

With Accumulo client in the mix, the sqoop script, if invoked to run in the background, would go into a Stopped state and could only resume if the script were foregrounded using the "fg" command.

Uninstalling the Accumulo client was what ultimately worked-around / fixed the issue.

Not sure if this is a bug or due to the fact that sqoop is self is a bash script that calls another script that sources the configure-sqoop script.

Thanks for your help.

@Kent Baxley did you have a chance to try using screen before uninstalling Accumulo client? Based on your discovery that redirecting output (sqoop.sh &> /dev/null &) was successful, I would think using screen would also work.

Explorer

@Alex Miller I was able to reproduce and personally had luck with screen as well as placing the "&" inside my test script itself at the end of the sqoop command rather than trying to background the script at invocation time (i.e. ./sqoop.sh &).

The /dev/null thing was also successful for me as well with Accumulo in place.

The customer apparently had gone ahead and removed the Accumulo bits before they had a chance to test my suggestions since any further they weren't using it, anyway.

So I really think there isn't a bug and we are hitting some bash-isms here more than anything else.

Thanks, all, for the tips.

@Kent Baxley Thanks for sharing this.

Explorer

I too had this problem. My bash script worked fine in my DEV environment as a background job and as a foreground job. However, in my TEST environment the job would only run as a foreground job. In TEST, running as a nohup job would seem to stop at the point where my Sqoop step was called. Ultimately I came across this thread which pointed me in the right direction. Essentially you can emulate nohup by "daemonizing" your script.

setsid ./sqoop.sh </dev/null &>myLog.out &

@Kent Baxley

Check what is the $PATH variable used in the script and in the CMD line. try adding the PATH to the script to be same as you run in the cmdline.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.