Created 11-03-2015 09:12 PM
Consdering the following bash script for Sqoop:
#!/bin/sh
connection_string="jdbc:sqlserver://remoteserver.somehwere.location-asp.com:1433;database=idistrict"
user_name="OPSUser"
db_password="OPSReader"
sqoop_cmd="list-databases"
sqoop $sqoop_cmd --connect $connection_string --username $user_name --password $db_password
We can run this just fine in the foreground, i.e.:
./sqoop_test.sh
But running it in the background like so:
./sqoop_test.sh &
The script appears to 'hang' when kicking off the actual sqoop command...i.e. nothing happens at all.
Using -x on the #!/bin/sh line shows that we end up at the last line of the script and then nothing...
We have tried all kinds of iterations of different commands like:
nohup bash sqoop.sh > results.txt 2>&1 &
./sqoop.sh &> /dev/null &
switched to #!/bin/bash
Any ideas? The odd thing is that the same exact script works fine both foregrounded and backgrounded on a different cluster. /etc/profile, and .bash_profile don't look to have any major differences.
Created 11-09-2015 04:29 PM
@Alex Miller I was able to reproduce and personally had luck with screen as well as placing the "&" inside my test script itself at the end of the sqoop command rather than trying to background the script at invocation time (i.e. ./sqoop.sh &).
The /dev/null thing was also successful for me as well with Accumulo in place.
The customer apparently had gone ahead and removed the Accumulo bits before they had a chance to test my suggestions since any further they weren't using it, anyway.
So I really think there isn't a bug and we are hitting some bash-isms here more than anything else.
Thanks, all, for the tips.
Created 11-03-2015 09:25 PM
Is this cluster setup with queues?
Can you check if sqoop is waiting on other jobs to finish ?
Created 11-03-2015 09:41 PM
@Neeraj - No other jobs are waiting to finish and we can run this pretty much at-will in the foreground without things getting seemingly stuck.
Created 11-03-2015 09:54 PM
add "`whoami'" and "`hostname`" to the script, see what prints out. I'd also add "2>&1 | tee -a log", i.e. redirect the output of the console to a file to see the output in foreground and background. It should give you some insight to what's happening. What specifically is the reason to running it in the background @Kent Baxley?
Created 11-03-2015 09:56 PM
check limits on the user in the background and in the foreground. There may be an OS limit on background processes.
Created 11-03-2015 10:34 PM
I would recommend using screen. It gives you all of the benefits of a background job will all of the benefits of a job running in the foreground as well: https://wiki.archlinux.org/index.php/GNU_Screen
Created 11-06-2015 12:58 AM
@Artem Ervits The reason behind the backgrounding is there are quite a few tables with 2+ million records and they would like to run start a sqoop job in the background after hours (there has to be a better way to do this, in my opinion).
Foregrounded:
-bash-4.1$ ./sample_sqoop.sh
whoami sboddu
hostname node2.example.com
2015-11-05 19:15:03,027 INFO - [main:] ~ Running Sqoop version: 1.4.6.2.3.0.0-2557 (Sqoop:92) 2015-11-05 19:15:03,043 WARN - [main:] ~ Setting your password on the command-line is insecure. Consider using -P instead. (BaseSqoopTool:1021)
2015-11-05 19:15:03,248 INFO - [main:] ~ Using default fetchSize of 1000 (SqlManager:98) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.3.0.0-2557/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.3.0.0-2557/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.3.0.0-2557/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
master
tempdb
model msdb
ReportServer
Idistrict_Distributer
idistrict
iDistrict_Audit
iDistrict_SlimDB
iDistrict_Reports
ReportServerTempDB
Idistrict_Replication
iDistrict_Attachment FTL
Backgrounded:
-bash-4.1$ ./sample_sqoop.sh& [2] 33320 -bash-4.1$
whoami sboddu
hostname node2.example.com
[2]+ Stopped ./sample_sqoop.sh
Created 11-06-2015 01:25 AM
you're sqooping master, model, reportdb, not sure if you need to do that, I would limit the tables just to the ones you need. Other than that, please check ulimit on the user executing the job in foreground and background, http://www.commandlinefu.com/commands/view/9893/find-ulimit-values-of-currently-running-process.
Created 11-06-2015 10:12 PM
@Artem Ervits Turns out that the factor was having the Accumulo Client installed on the machine alongside sqoop.
With Accumulo client in the mix, the sqoop script, if invoked to run in the background, would go into a Stopped state and could only resume if the script were foregrounded using the "fg" command.
Uninstalling the Accumulo client was what ultimately worked-around / fixed the issue.
Not sure if this is a bug or due to the fact that sqoop is self is a bash script that calls another script that sources the configure-sqoop script.
Thanks for your help.
Created 11-09-2015 04:19 PM
@Kent Baxley did you have a chance to try using screen before uninstalling Accumulo client? Based on your discovery that redirecting output (sqoop.sh &> /dev/null &) was successful, I would think using screen would also work.