Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Scheduling Spark with Crontab

SOLVED Go to solution

Scheduling Spark with Crontab

Explorer

I have written a Spark application in python and successfully tested it. I run it with spark-submit in command line.

Everything seemes to work fine and I get the expected output.

The problem is, when I try to schedule my application through crontab, to run every 5 minutes, it fails with the following error: 

 

/u01/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/bin/compute-classpath.sh: line 64: hadoop: command not found
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.deploy.SparkSubmitArguments.parse$1(SparkSubmitArguments.scala:313)
at org.apache.spark.deploy.SparkSubmitArguments.parseOpts(SparkSubmitArguments.scala:207)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:59)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:50)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 5 more

 

It looks to me that crontab is not able to load the environment variables where I store all the paths to the jars (the hadoop classpath is missing when the script is launched by crontab). Did anyone encountered this issue? I tried some of these solutions: http://unix.stackexchange.com/questions/27289/how-can-i-run-a-cron-command-with-existing-environment...

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Scheduling Spark with Crontab

Explorer

Try adding 

 

source /etc/hadoop/conf/hadoop-env.sh

source /etc/spark/conf/spark-env.sh

 

to the top of a shell-script that submits your Spark job.  Don't have a VM with Spark / Hadoop handy right now, but IIRC that's what I've needed to do in the past.

 

8 REPLIES 8

Re: Scheduling Spark with Crontab

Master Collaborator

Whatever user you are running this as doesn't seem to have the PATH or env variables set up. See the first error:

 

hadoop: command not found

Re: Scheduling Spark with Crontab

Explorer

Thank you Sowen for the reply but actually I was saying that the hadoop classpath & other is missing only when the script is launched by crontab. I have no problems when I launch the script manually.  

Re: Scheduling Spark with Crontab

Master Collaborator

Right. What user is used in each case?

Re: Scheduling Spark with Crontab

Explorer

In each of the 2 cases I use the same user (my user name). To define the scheduling of the crontab job I use "crontab -e" under my user. 

Re: Scheduling Spark with Crontab

Master Collaborator

Is some of the environment setup only happening in your shell config that is triggered for interactive shells?

The problem is fairly clear -- env not setup, and the question is why, but it's not really a Spark issue per se.

Re: Scheduling Spark with Crontab

Explorer

Try adding 

 

source /etc/hadoop/conf/hadoop-env.sh

source /etc/spark/conf/spark-env.sh

 

to the top of a shell-script that submits your Spark job.  Don't have a VM with Spark / Hadoop handy right now, but IIRC that's what I've needed to do in the past.

 

Highlighted

Re: Scheduling Spark with Crontab

Explorer

Thank you all!

 

I have re-set the env variables in crontab as you suggested. It seems to work fine!

 

 

Re: Scheduling Spark with Crontab

Explorer

Hi,

 

I am trying to schedule a spark job using cron.

I have made a shell script and it executes well on the terminal.

 

However, when I execute the script using cron it gives me insufficient memory to start JVM thread error.

 

Every time I start the script using terminal there is no issue. This issue comes when the script starts with cron.

Kindly if you could suggest something.