Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Error "tried to access method com.google.common.base.Stopwatch...." in Hue/Oozie Spark action.

avatar
Contributor

This is frustrating because I had this working previously, but it no longer works correctly.

 

I'm executing TeraGen/TeraSort/TeraValidate from the com.github.ehiggs.spark.terasort library as a training method.

 

I can usually execute TeraGen successfully, but on the TeraSort step, I get the error: 

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

If I move the TeraSort step above the TeraGen step, I can execute TeraSort, then TeraGen, then TeraSort again, but I get that error on TeraValidate.

 

Can anyone identify what I'm doing wrong?

 

The Hue/Oozie editor creates the following workflow.xml file:

 

<workflow-app name="TeraGen_-_TeraSort_-_TeraValidate" xmlns="uri:oozie:workflow:0.5">
  <start to="spark-0883"/>
  <kill name="Kill">
    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  </kill>
  <action name="spark-f631">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <master></master>
      <mode></mode>
      <name>TeraSort</name>
        <class>com.github.ehiggs.spark.terasort.TeraSort</class>
      <jar>/user/hue/oozie/workspaces/hue-oozie-1450123297.08/lib/spark-terasort.jar</jar>
        <spark-opts>--jars /user/hue/oozie/workspaces/hue-oozie-1450123297.08/lib/spark-terasort.jar</spark-opts>
        <arg>/user/davidw/terasort-benchmark.in</arg>
        <arg>/user/davidw/terasort-benchmark.out</arg>
    </spark>
    <ok to="spark-504c"/>
    <error to="Kill"/>
  </action>
  <action name="spark-0883">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${nameNode}/user/davidw/terasort-benchmark.in"/>
        <delete path="${nameNode}/user/davidw/terasort-benchmark.out"/>
        <delete path="${nameNode}/user/davidw/terasort-benchmark.validate"/>
      </prepare>
      <master></master>
      <mode></mode>
      <name>TeraGen</name>
        <class>com.github.ehiggs.spark.terasort.TeraGen</class>
      <jar>/user/hue/oozie/workspaces/hue-oozie-1450123297.08/lib/spark-terasort.jar</jar>
        <spark-opts>--jars /user/hue/oozie/workspaces/hue-oozie-1450123297.08/lib/spark-terasort.jar</spark-opts>
        <arg>1g</arg>
        <arg>/user/davidw/terasort-benchmark.in</arg>
    </spark>
    <ok to="spark-f631"/>
    <error to="Kill"/>
  </action>
  <action name="spark-504c">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <master></master>
      <mode></mode>
      <name>TeraValidate</name>
        <class>com.github.ehiggs.spark.terasort.TeraValidate</class>
      <jar>/user/hue/oozie/workspaces/hue-oozie-1450123297.08/lib/spark-terasort.jar</jar>
        <spark-opts>--jars /user/hue/oozie/workspaces/hue-oozie-1450123297.08/lib/spark-terasort.jar</spark-opts>
        <arg>/user/davidw/terasort-benchmark.out</arg>
        <arg>/user/davidw/terasort-benchmark.validate</arg>
    </spark>
    <ok to="End"/>
    <error to="Kill"/>
  </action>
  <end name="End"/>
</workflow-app>

 

 

1 ACCEPTED SOLUTION

avatar
Contributor

I was able to rebuild the Oozie job and make it work, although I really don't know what is different.

 

I built the job in sequence this time, so that the steps are listed in-sequence in the XML file.  

I also built the job steps to reference the lib directory in the job's path.

I had previously had success with explicit references, but these didn't seem necessary.  

I moved the prepare steps to a point right before they were needed instead of all on the first step.

I eliminated the output directory definition for TeraValidate because it doesn't seem to be used.

Finally, I let Hue/Oozie choose the defaults for Master and Mode.  I played around with trying to use YARN and cluster, but these didn't work.

 

My resulting XML (that works) looks like this:

 

<workflow-app name="TeraGen-TeraSort-TeraValidate" xmlns="uri:oozie:workflow:0.5">
  <start to="spark-27f0"/>
  <kill name="Kill">
    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  </kill>
  <action name="spark-27f0">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${nameNode}/user/davidw/terasort-benchmark.in"/>
      </prepare>
      <master>local[*]</master>
      <mode>client</mode>
      <name>TeraGen</name>
        <class>com.github.ehiggs.spark.terasort.TeraGen</class>
      <jar>lib/spark-terasort.jar</jar>
        <arg>1g</arg>
        <arg>/user/davidw/terasort-benchmark.in</arg>
    </spark>
    <ok to="spark-94fc"/>
    <error to="Kill"/>
  </action>
  <action name="spark-94fc">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${nameNode}/user/davidw/terasort-benchmark.out"/>
      </prepare>
      <master>local[*]</master>
      <mode>client</mode>
      <name>TeraSort</name>
        <class>com.github.ehiggs.spark.terasort.TeraSort</class>
      <jar>lib/spark-terasort.jar</jar>
        <arg>/user/davidw/terasort-benchmark.in</arg>
        <arg>/user/davidw/terasort-benchmark.out</arg>
    </spark>
    <ok to="spark-bcf9"/>
    <error to="Kill"/>
  </action>
  <action name="spark-bcf9">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <master>local[*]</master>
      <mode>client</mode>
      <name>TeraValidate</name>
        <class>com.github.ehiggs.spark.terasort.TeraValidate</class>
      <jar>lib/spark-terasort.jar</jar>
        <arg>/user/davidw/terasort-benchmark.out</arg>
    </spark>
    <ok to="End"/>
    <error to="Kill"/>
  </action>
  <end name="End"/>
</workflow-app>
 

View solution in original post

1 REPLY 1

avatar
Contributor

I was able to rebuild the Oozie job and make it work, although I really don't know what is different.

 

I built the job in sequence this time, so that the steps are listed in-sequence in the XML file.  

I also built the job steps to reference the lib directory in the job's path.

I had previously had success with explicit references, but these didn't seem necessary.  

I moved the prepare steps to a point right before they were needed instead of all on the first step.

I eliminated the output directory definition for TeraValidate because it doesn't seem to be used.

Finally, I let Hue/Oozie choose the defaults for Master and Mode.  I played around with trying to use YARN and cluster, but these didn't work.

 

My resulting XML (that works) looks like this:

 

<workflow-app name="TeraGen-TeraSort-TeraValidate" xmlns="uri:oozie:workflow:0.5">
  <start to="spark-27f0"/>
  <kill name="Kill">
    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  </kill>
  <action name="spark-27f0">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${nameNode}/user/davidw/terasort-benchmark.in"/>
      </prepare>
      <master>local[*]</master>
      <mode>client</mode>
      <name>TeraGen</name>
        <class>com.github.ehiggs.spark.terasort.TeraGen</class>
      <jar>lib/spark-terasort.jar</jar>
        <arg>1g</arg>
        <arg>/user/davidw/terasort-benchmark.in</arg>
    </spark>
    <ok to="spark-94fc"/>
    <error to="Kill"/>
  </action>
  <action name="spark-94fc">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${nameNode}/user/davidw/terasort-benchmark.out"/>
      </prepare>
      <master>local[*]</master>
      <mode>client</mode>
      <name>TeraSort</name>
        <class>com.github.ehiggs.spark.terasort.TeraSort</class>
      <jar>lib/spark-terasort.jar</jar>
        <arg>/user/davidw/terasort-benchmark.in</arg>
        <arg>/user/davidw/terasort-benchmark.out</arg>
    </spark>
    <ok to="spark-bcf9"/>
    <error to="Kill"/>
  </action>
  <action name="spark-bcf9">
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <master>local[*]</master>
      <mode>client</mode>
      <name>TeraValidate</name>
        <class>com.github.ehiggs.spark.terasort.TeraValidate</class>
      <jar>lib/spark-terasort.jar</jar>
        <arg>/user/davidw/terasort-benchmark.out</arg>
    </spark>
    <ok to="End"/>
    <error to="Kill"/>
  </action>
  <end name="End"/>
</workflow-app>