Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Using mahout 0.8 build recommendation app in cdh5

avatar
Explorer

Hi Sean, I have a problem about running mahout app in cdh5. Please help me to solve it.

The code as:

 

public class ItemCFHadoop {

    private static final String HDFS = "hdfs://192.168.1.210:9000";

    public static void main(String[] args) throws Exception {
        String localFile = "datafile/item.csv";
        String inPath = HDFS + "/user/hdfs/userCF";
        String inFile = inPath + "/item.csv";
        String outPath = HDFS + "/user/hdfs/userCF/result/";
        String outFile = outPath + "/part-r-00000";
        String tmpPath = HDFS + "/tmp/" + System.currentTimeMillis();

        JobConf conf = config();

        StringBuilder sb = new StringBuilder();
        sb.append("--input ").append(inPath);
        sb.append(" --output ").append(outPath);
        sb.append(" --booleanData true");
        sb.append(" --similarityClassname org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.EuclideanDistanceSimilarity");
        sb.append(" --tempDir ").append(tmpPath);
        args = sb.toString().split(" ");

        RecommenderJob job = new RecommenderJob();
        job.setConf(conf);
        job.run(args);
    }

    public static JobConf config() {
        JobConf conf = new JobConf(ItemCFHadoop.class);
        conf.setJobName("ItemCFHadoop");
        conf.addResource("classpath:/hadoop/core-site.xml");
        conf.addResource("classpath:/hadoop/hdfs-site.xml");
        conf.addResource("classpath:/hadoop/mapred-site.xml");
        return conf;
    }
}

 

The exception as:

 

[hdfs@hadoop140 mahout]$ hadoop jar /home/mahout/myMahout.jar
mahout in hadoop cluster running...

14/04/25 17:21:16 INFO common.AbstractJob: Command line arguments: {--booleanData=[true], --endPhase=[2147483647], --input=[hdfs://192.168.1.140:9000/user/hdfs/userCF], --maxPrefsPerUser=[10], --maxPrefsPerUserInItemSimilarity=[1000], --maxSimilaritiesPerItem=[100], --minPrefsPerUser=[1], --numRecommendations=[10], --output=[hdfs://192.168.1.140:9000/user/hdfs/userCF/result/], --similarityClassname=[org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.EuclideanDistanceSimilarity], --startPhase=[0], --tempDir=[hdfs://192.168.1.140:9000/tmp/1398417675558]}
14/04/25 17:21:16 INFO common.AbstractJob: Command line arguments: {--booleanData=[true], --endPhase=[2147483647], --input=[hdfs://192.168.1.140:9000/user/hdfs/userCF], --maxPrefsPerUser=[1000], --minPrefsPerUser=[1], --output=[hdfs://192.168.1.140:9000/tmp/1398417675558/preparePreferenceMatrix], --ratingShift=[0.0], --startPhase=[0], --tempDir=[hdfs://192.168.1.140:9000/tmp/1398417675558]}
14/04/25 17:21:17 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/04/25 17:21:17 INFO Configuration.deprecation: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
14/04/25 17:21:17 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:174)
at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:614)
at org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run(PreparePreferenceMatrixJob.java:75)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:158)
at org.conan.mymahout.recommendation.ItemCFHadoop.main(ItemCFHadoop.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

 

Xuesong
1 ACCEPTED SOLUTION

avatar
Master Collaborator

(If you encounter an error, you should state what the error is, but I assume it is "artifact not found" here.)

 

Add this to your pom.xml file at the end so that Maven knows to look in the Cloudera repo:

 

<repositories>
  <repository>
    <id>cloudera.repo</id>
    <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
    <name>Cloudera Repositories</name>
    <releases>
      <enabled>true</enabled>
    </releases>
    <snapshots>
      <enabled>false</enabled>
    </snapshots>
  </repository>
</repositories>

 

View solution in original post

5 REPLIES 5

avatar
Master Collaborator

If I'm not mistaken, you have built your own application that embeds code from Mahout 0.8. That is not compatible with Hadoop 2, and CDH5 is based on Hadoop 2.3+. It also includes a distribution of Mahout 0.8 that has been modified to work on Hadoop 2. 

 

To make this work, you can try to depend on the 0.8 distribution of Mahout from CDH 5 in your project instead, since it contains necessary modifications. 

 

For example, if building with Maven, instead of specifying version 0.8 for artifact org.apache.mahout:mahout-core you would specify 0.8-cdh5.0.0 . You would also need to reference the Cloudera repo in your build.

 

You could also recompile Mahout locally for Hadoop 2, but I think this is the most trouble.

avatar
Explorer

Hi Sean, you are right. Thanks.

 

Xuesong

avatar
Explorer

Thanks.

And I want to know how to reference the Cloudera repo and specify mahout distribution to 0.8-cdh5.0.0. When i changed the version to 0.8-cdh5.0.0, the build occurred error.

Would you give a sample that using mahout version 0.8-cdh5.0.0?

 

Now my pom.xml as:

 

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
 <modelVersion>4.0.0</modelVersion>
 <groupId>org.xuesong.mymahout</groupId>
 <artifactId>myMahout</artifactId>
 <packaging>jar</packaging>
 <version>1.0-SNAPSHOT</version>
 <name>myMahout</name>
 <url>http://maven.apache.org</url>

 <properties>
  <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  <mahout.version>0.8</mahout.version>
 </properties>

 <dependencies>
  <dependency>
   <groupId>org.apache.mahout</groupId>
   <artifactId>mahout-core</artifactId>
   <version>${mahout.version}</version>
  </dependency>
  <dependency>
   <groupId>org.apache.mahout</groupId>
   <artifactId>mahout-integration</artifactId>
   <version>${mahout.version}</version>
   <exclusions>
    <exclusion>
     <groupId>org.mortbay.jetty</groupId>
     <artifactId>jetty</artifactId>
    </exclusion>
    <exclusion>
     <groupId>org.apache.cassandra</groupId>
     <artifactId>cassandra-all</artifactId>
    </exclusion>
    <exclusion>
     <groupId>me.prettyprint</groupId>
     <artifactId>hector-core</artifactId>
    </exclusion>
   </exclusions>
  </dependency>
 </dependencies>
</project>

Xuesong

avatar
Master Collaborator

(If you encounter an error, you should state what the error is, but I assume it is "artifact not found" here.)

 

Add this to your pom.xml file at the end so that Maven knows to look in the Cloudera repo:

 

<repositories>
  <repository>
    <id>cloudera.repo</id>
    <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
    <name>Cloudera Repositories</name>
    <releases>
      <enabled>true</enabled>
    </releases>
    <snapshots>
      <enabled>false</enabled>
    </snapshots>
  </repository>
</repositories>

 

avatar
Explorer

You are so kind. Thanks for your help.

Xuesong