<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Scheduling Spark with Crontab in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/25050#M5128</link>
    <description>&lt;P&gt;Whatever user you are running this as doesn't seem to have the PATH or env variables set up. See the first error:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;hadoop: command not found&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 25 Feb 2015 18:37:01 GMT</pubDate>
    <dc:creator>srowen</dc:creator>
    <dc:date>2015-02-25T18:37:01Z</dc:date>
    <item>
      <title>Scheduling Spark with Crontab</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/25046#M5127</link>
      <description>&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;I have written a Spark application in python and successfully tested it. I run it with spark-submit in command line. &lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;Everything seemes to work fine and I get the expected output.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;The problem is, when I try to schedule my application through crontab, to run every 5 minutes, it fails with the following error:&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;/u01/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/bin/compute-classpath.sh: line 64: hadoop: command not found&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;at org.apache.spark.deploy.SparkSubmitArguments.parse$1(SparkSubmitArguments.scala:313)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;at org.apache.spark.deploy.SparkSubmitArguments.parseOpts(SparkSubmitArguments.scala:207)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;at org.apache.spark.deploy.SparkSubmitArguments.&amp;lt;init&amp;gt;(SparkSubmitArguments.scala:59)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:50)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;at java.net.URLClassLoader$1.run(URLClassLoader.java:366)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;at java.net.URLClassLoader$1.run(URLClassLoader.java:355)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;at java.security.AccessController.doPrivileged(Native Method)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;at java.net.URLClassLoader.findClass(URLClassLoader.java:354)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;at java.lang.ClassLoader.loadClass(ClassLoader.java:425)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;at java.lang.ClassLoader.loadClass(ClassLoader.java:358)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;... 5 more&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;It looks to me that crontab is not able to load the environment variables where I store all the paths to the jars (the hadoop classpath is missing when the script is launched by crontab). Did anyone encountered this issue? I tried some of these solutions:&amp;nbsp;&lt;A target="_self" href="http://unix.stackexchange.com/questions/27289/how-can-i-run-a-cron-command-with-existing-environmental-variables"&gt;http://unix.stackexchange.com/questions/27289/how-can-i-run-a-cron-command-with-existing-environmental-variables&lt;/A&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:22:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/25046#M5127</guid>
      <dc:creator>mgavrilescu</dc:creator>
      <dc:date>2022-09-16T09:22:39Z</dc:date>
    </item>
    <item>
      <title>Re: Scheduling Spark with Crontab</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/25050#M5128</link>
      <description>&lt;P&gt;Whatever user you are running this as doesn't seem to have the PATH or env variables set up. See the first error:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;hadoop: command not found&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Feb 2015 18:37:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/25050#M5128</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2015-02-25T18:37:01Z</dc:date>
    </item>
    <item>
      <title>Re: Scheduling Spark with Crontab</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/25059#M5129</link>
      <description>&lt;P&gt;Thank you Sowen for the reply but actually I was saying that the hadoop classpath &amp;amp; other is missing only when the script is launched by crontab. I have no problems when I launch the script manually. &amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 26 Feb 2015 08:45:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/25059#M5129</guid>
      <dc:creator>mgavrilescu</dc:creator>
      <dc:date>2015-02-26T08:45:07Z</dc:date>
    </item>
    <item>
      <title>Re: Scheduling Spark with Crontab</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/25060#M5130</link>
      <description>&lt;P&gt;Right. What user is used in each case?&lt;/P&gt;</description>
      <pubDate>Thu, 26 Feb 2015 09:23:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/25060#M5130</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2015-02-26T09:23:27Z</dc:date>
    </item>
    <item>
      <title>Re: Scheduling Spark with Crontab</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/25061#M5131</link>
      <description>&lt;P&gt;In each of the 2 cases I use the same user (my user name). To define the scheduling of the crontab job I use "crontab -e" under my user.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 26 Feb 2015 09:26:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/25061#M5131</guid>
      <dc:creator>mgavrilescu</dc:creator>
      <dc:date>2015-02-26T09:26:02Z</dc:date>
    </item>
    <item>
      <title>Re: Scheduling Spark with Crontab</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/25062#M5132</link>
      <description>&lt;P&gt;Is some of the environment setup only happening in your shell config that is triggered for interactive shells?&lt;/P&gt;&lt;P&gt;The problem is fairly clear -- env not setup, and the question is why, but it's not really a Spark issue per se.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Feb 2015 09:28:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/25062#M5132</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2015-02-26T09:28:03Z</dc:date>
    </item>
    <item>
      <title>Re: Scheduling Spark with Crontab</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/25111#M5133</link>
      <description>&lt;P&gt;Try adding&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;source /etc/hadoop/conf/hadoop-env.sh&lt;/P&gt;&lt;P&gt;source /etc/spark/conf/spark-env.sh&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;to the top of a shell-script that submits&amp;nbsp;your Spark job. &amp;nbsp;Don't have a VM with Spark / Hadoop handy right now, but IIRC that's what I've needed to do in the past.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Feb 2015 07:52:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/25111#M5133</guid>
      <dc:creator>hadooptom</dc:creator>
      <dc:date>2015-02-27T07:52:30Z</dc:date>
    </item>
    <item>
      <title>Re: Scheduling Spark with Crontab</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/25119#M5134</link>
      <description>&lt;P&gt;Thank you all!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have re-set the env variables in crontab as you suggested. It seems to work fine!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Feb 2015 14:28:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/25119#M5134</guid>
      <dc:creator>mgavrilescu</dc:creator>
      <dc:date>2015-02-27T14:28:06Z</dc:date>
    </item>
    <item>
      <title>Re: Scheduling Spark with Crontab</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/34301#M5135</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am trying to schedule a spark job using cron.&lt;/P&gt;&lt;P&gt;I have made a shell script and it executes well on the terminal.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, when I execute the script using cron it gives me insufficient memory to start JVM thread error.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Every time I start the script using terminal there is no issue. This issue comes when the script starts with cron.&lt;/P&gt;&lt;P&gt;Kindly if you could suggest something.&lt;/P&gt;</description>
      <pubDate>Sat, 21 Nov 2015 13:38:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Scheduling-Spark-with-Crontab/m-p/34301#M5135</guid>
      <dc:creator>Sarthak</dc:creator>
      <dc:date>2015-11-21T13:38:34Z</dc:date>
    </item>
  </channel>
</rss>

