Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Running a Sqoop script in the background appears to 'hang' but works when running in the foreground

avatar
Contributor

Consdering the following bash script for Sqoop:

#!/bin/sh

connection_string="jdbc:sqlserver://remoteserver.somehwere.location-asp.com:1433;database=idistrict"

user_name="OPSUser"

db_password="OPSReader"

sqoop_cmd="list-databases"

sqoop $sqoop_cmd --connect $connection_string --username $user_name --password $db_password

We can run this just fine in the foreground, i.e.:

./sqoop_test.sh

But running it in the background like so:

./sqoop_test.sh &

The script appears to 'hang' when kicking off the actual sqoop command...i.e. nothing happens at all.

Using -x on the #!/bin/sh line shows that we end up at the last line of the script and then nothing...

We have tried all kinds of iterations of different commands like:

nohup bash sqoop.sh > results.txt 2>&1 &

./sqoop.sh &> /dev/null &

switched to #!/bin/bash

Any ideas? The odd thing is that the same exact script works fine both foregrounded and backgrounded on a different cluster. /etc/profile, and .bash_profile don't look to have any major differences.

1 ACCEPTED SOLUTION

avatar
Contributor

@Alex Miller I was able to reproduce and personally had luck with screen as well as placing the "&" inside my test script itself at the end of the sqoop command rather than trying to background the script at invocation time (i.e. ./sqoop.sh &).

The /dev/null thing was also successful for me as well with Accumulo in place.

The customer apparently had gone ahead and removed the Accumulo bits before they had a chance to test my suggestions since any further they weren't using it, anyway.

So I really think there isn't a bug and we are hitting some bash-isms here more than anything else.

Thanks, all, for the tips.

View solution in original post

13 REPLIES 13

avatar
Master Mentor

@Kent Baxley

Is this cluster setup with queues?

Can you check if sqoop is waiting on other jobs to finish ?

avatar
Contributor

@Neeraj - No other jobs are waiting to finish and we can run this pretty much at-will in the foreground without things getting seemingly stuck.

avatar
Master Mentor

add "`whoami'" and "`hostname`" to the script, see what prints out. I'd also add "2>&1 | tee -a log", i.e. redirect the output of the console to a file to see the output in foreground and background. It should give you some insight to what's happening. What specifically is the reason to running it in the background @Kent Baxley?

avatar
Master Mentor

check limits on the user in the background and in the foreground. There may be an OS limit on background processes.

avatar

I would recommend using screen. It gives you all of the benefits of a background job will all of the benefits of a job running in the foreground as well: https://wiki.archlinux.org/index.php/GNU_Screen

avatar
Contributor

@Artem Ervits The reason behind the backgrounding is there are quite a few tables with 2+ million records and they would like to run start a sqoop job in the background after hours (there has to be a better way to do this, in my opinion).

Foregrounded:

-bash-4.1$ ./sample_sqoop.sh

whoami sboddu

hostname node2.example.com

2015-11-05 19:15:03,027 INFO - [main:] ~ Running Sqoop version: 1.4.6.2.3.0.0-2557 (Sqoop:92) 2015-11-05 19:15:03,043 WARN - [main:] ~ Setting your password on the command-line is insecure. Consider using -P instead. (BaseSqoopTool:1021)

2015-11-05 19:15:03,248 INFO - [main:] ~ Using default fetchSize of 1000 (SqlManager:98) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.3.0.0-2557/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.3.0.0-2557/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.3.0.0-2557/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

master

tempdb

model msdb

ReportServer

Idistrict_Distributer

idistrict

iDistrict_Audit

iDistrict_SlimDB

iDistrict_Reports

ReportServerTempDB

Idistrict_Replication

iDistrict_Attachment FTL

Backgrounded:

-bash-4.1$ ./sample_sqoop.sh& [2] 33320 -bash-4.1$

whoami sboddu

hostname node2.example.com

[2]+ Stopped ./sample_sqoop.sh

avatar
Master Mentor

you're sqooping master, model, reportdb, not sure if you need to do that, I would limit the tables just to the ones you need. Other than that, please check ulimit on the user executing the job in foreground and background, http://www.commandlinefu.com/commands/view/9893/find-ulimit-values-of-currently-running-process.

avatar
Contributor

@Artem Ervits Turns out that the factor was having the Accumulo Client installed on the machine alongside sqoop.

With Accumulo client in the mix, the sqoop script, if invoked to run in the background, would go into a Stopped state and could only resume if the script were foregrounded using the "fg" command.

Uninstalling the Accumulo client was what ultimately worked-around / fixed the issue.

Not sure if this is a bug or due to the fact that sqoop is self is a bash script that calls another script that sources the configure-sqoop script.

Thanks for your help.

avatar

@Kent Baxley did you have a chance to try using screen before uninstalling Accumulo client? Based on your discovery that redirecting output (sqoop.sh &> /dev/null &) was successful, I would think using screen would also work.