Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Error "tried to access method com.google.common.base.Stopwatch...." in Hue/Oozie Spark action.

avatar
Contributor

This is frustrating because I had this working previously, but it no longer works correctly.

 

I'm executing TeraGen/TeraSort/TeraValidate from the com.github.ehiggs.spark.terasort library as a training method.

 

I can usually execute TeraGen successfully, but on the TeraSort step, I get the error: 

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

If I move the TeraSort step above the TeraGen step, I can execute TeraSort, then TeraGen, then TeraSort again, but I get that error on TeraValidate.

 

Can anyone identify what I'm doing wrong?

 

The Hue/Oozie editor creates the following workflow.xml file:

 

<workflow-app name="TeraGen_-_TeraSort_-_TeraValidate" xmlns="uri:oozie:workflow:0.5">
  <start to="spark-0883"/>
  <kill name="Kill">
    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  </kill>
  <action name="spark-f631">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <master></master>
      <mode></mode>
      <name>TeraSort</name>
        <class>com.github.ehiggs.spark.terasort.TeraSort</class>
      <jar>/user/hue/oozie/workspaces/hue-oozie-1450123297.08/lib/spark-terasort.jar</jar>
        <spark-opts>--jars /user/hue/oozie/workspaces/hue-oozie-1450123297.08/lib/spark-terasort.jar</spark-opts>
        <arg>/user/davidw/terasort-benchmark.in</arg>
        <arg>/user/davidw/terasort-benchmark.out</arg>
    </spark>
    <ok to="spark-504c"/>
    <error to="Kill"/>
  </action>
  <action name="spark-0883">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${nameNode}/user/davidw/terasort-benchmark.in"/>
        <delete path="${nameNode}/user/davidw/terasort-benchmark.out"/>
        <delete path="${nameNode}/user/davidw/terasort-benchmark.validate"/>
      </prepare>
      <master></master>
      <mode></mode>
      <name>TeraGen</name>
        <class>com.github.ehiggs.spark.terasort.TeraGen</class>
      <jar>/user/hue/oozie/workspaces/hue-oozie-1450123297.08/lib/spark-terasort.jar</jar>
        <spark-opts>--jars /user/hue/oozie/workspaces/hue-oozie-1450123297.08/lib/spark-terasort.jar</spark-opts>
        <arg>1g</arg>
        <arg>/user/davidw/terasort-benchmark.in</arg>
    </spark>
    <ok to="spark-f631"/>
    <error to="Kill"/>
  </action>
  <action name="spark-504c">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <master></master>
      <mode></mode>
      <name>TeraValidate</name>
        <class>com.github.ehiggs.spark.terasort.TeraValidate</class>
      <jar>/user/hue/oozie/workspaces/hue-oozie-1450123297.08/lib/spark-terasort.jar</jar>
        <spark-opts>--jars /user/hue/oozie/workspaces/hue-oozie-1450123297.08/lib/spark-terasort.jar</spark-opts>
        <arg>/user/davidw/terasort-benchmark.out</arg>
        <arg>/user/davidw/terasort-benchmark.validate</arg>
    </spark>
    <ok to="End"/>
    <error to="Kill"/>
  </action>
  <end name="End"/>
</workflow-app>

 

 

1 ACCEPTED SOLUTION

avatar
Contributor

I was able to rebuild the Oozie job and make it work, although I really don't know what is different.

 

I built the job in sequence this time, so that the steps are listed in-sequence in the XML file.  

I also built the job steps to reference the lib directory in the job's path.

I had previously had success with explicit references, but these didn't seem necessary.  

I moved the prepare steps to a point right before they were needed instead of all on the first step.

I eliminated the output directory definition for TeraValidate because it doesn't seem to be used.

Finally, I let Hue/Oozie choose the defaults for Master and Mode.  I played around with trying to use YARN and cluster, but these didn't work.

 

My resulting XML (that works) looks like this:

 

<workflow-app name="TeraGen-TeraSort-TeraValidate" xmlns="uri:oozie:workflow:0.5">
  <start to="spark-27f0"/>
  <kill name="Kill">
    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  </kill>
  <action name="spark-27f0">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${nameNode}/user/davidw/terasort-benchmark.in"/>
      </prepare>
      <master>local[*]</master>
      <mode>client</mode>
      <name>TeraGen</name>
        <class>com.github.ehiggs.spark.terasort.TeraGen</class>
      <jar>lib/spark-terasort.jar</jar>
        <arg>1g</arg>
        <arg>/user/davidw/terasort-benchmark.in</arg>
    </spark>
    <ok to="spark-94fc"/>
    <error to="Kill"/>
  </action>
  <action name="spark-94fc">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${nameNode}/user/davidw/terasort-benchmark.out"/>
      </prepare>
      <master>local[*]</master>
      <mode>client</mode>
      <name>TeraSort</name>
        <class>com.github.ehiggs.spark.terasort.TeraSort</class>
      <jar>lib/spark-terasort.jar</jar>
        <arg>/user/davidw/terasort-benchmark.in</arg>
        <arg>/user/davidw/terasort-benchmark.out</arg>
    </spark>
    <ok to="spark-bcf9"/>
    <error to="Kill"/>
  </action>
  <action name="spark-bcf9">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <master>local[*]</master>
      <mode>client</mode>
      <name>TeraValidate</name>
        <class>com.github.ehiggs.spark.terasort.TeraValidate</class>
      <jar>lib/spark-terasort.jar</jar>
        <arg>/user/davidw/terasort-benchmark.out</arg>
    </spark>
    <ok to="End"/>
    <error to="Kill"/>
  </action>
  <end name="End"/>
</workflow-app>
 

View solution in original post

1 REPLY 1

avatar
Contributor

I was able to rebuild the Oozie job and make it work, although I really don't know what is different.

 

I built the job in sequence this time, so that the steps are listed in-sequence in the XML file.  

I also built the job steps to reference the lib directory in the job's path.

I had previously had success with explicit references, but these didn't seem necessary.  

I moved the prepare steps to a point right before they were needed instead of all on the first step.

I eliminated the output directory definition for TeraValidate because it doesn't seem to be used.

Finally, I let Hue/Oozie choose the defaults for Master and Mode.  I played around with trying to use YARN and cluster, but these didn't work.

 

My resulting XML (that works) looks like this:

 

<workflow-app name="TeraGen-TeraSort-TeraValidate" xmlns="uri:oozie:workflow:0.5">
  <start to="spark-27f0"/>
  <kill name="Kill">
    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  </kill>
  <action name="spark-27f0">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${nameNode}/user/davidw/terasort-benchmark.in"/>
      </prepare>
      <master>local[*]</master>
      <mode>client</mode>
      <name>TeraGen</name>
        <class>com.github.ehiggs.spark.terasort.TeraGen</class>
      <jar>lib/spark-terasort.jar</jar>
        <arg>1g</arg>
        <arg>/user/davidw/terasort-benchmark.in</arg>
    </spark>
    <ok to="spark-94fc"/>
    <error to="Kill"/>
  </action>
  <action name="spark-94fc">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${nameNode}/user/davidw/terasort-benchmark.out"/>
      </prepare>
      <master>local[*]</master>
      <mode>client</mode>
      <name>TeraSort</name>
        <class>com.github.ehiggs.spark.terasort.TeraSort</class>
      <jar>lib/spark-terasort.jar</jar>
        <arg>/user/davidw/terasort-benchmark.in</arg>
        <arg>/user/davidw/terasort-benchmark.out</arg>
    </spark>
    <ok to="spark-bcf9"/>
    <error to="Kill"/>
  </action>
  <action name="spark-bcf9">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <master>local[*]</master>
      <mode>client</mode>
      <name>TeraValidate</name>
        <class>com.github.ehiggs.spark.terasort.TeraValidate</class>
      <jar>lib/spark-terasort.jar</jar>
        <arg>/user/davidw/terasort-benchmark.out</arg>
    </spark>
    <ok to="End"/>
    <error to="Kill"/>
  </action>
  <end name="End"/>
</workflow-app>