Created on 04-25-2014 06:42 AM - edited 09-16-2022 01:57 AM
Hi Sean, I have a problem about running mahout app in cdh5. Please help me to solve it.
The code as:
public class ItemCFHadoop {
private static final String HDFS = "hdfs://192.168.1.210:9000";
public static void main(String[] args) throws Exception {
String localFile = "datafile/item.csv";
String inPath = HDFS + "/user/hdfs/userCF";
String inFile = inPath + "/item.csv";
String outPath = HDFS + "/user/hdfs/userCF/result/";
String outFile = outPath + "/part-r-00000";
String tmpPath = HDFS + "/tmp/" + System.currentTimeMillis();
JobConf conf = config();
StringBuilder sb = new StringBuilder();
sb.append("--input ").append(inPath);
sb.append(" --output ").append(outPath);
sb.append(" --booleanData true");
sb.append(" --similarityClassname org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.EuclideanDistanceSimilarity");
sb.append(" --tempDir ").append(tmpPath);
args = sb.toString().split(" ");
RecommenderJob job = new RecommenderJob();
job.setConf(conf);
job.run(args);
}
public static JobConf config() {
JobConf conf = new JobConf(ItemCFHadoop.class);
conf.setJobName("ItemCFHadoop");
conf.addResource("classpath:/hadoop/core-site.xml");
conf.addResource("classpath:/hadoop/hdfs-site.xml");
conf.addResource("classpath:/hadoop/mapred-site.xml");
return conf;
}
}
The exception as:
[hdfs@hadoop140 mahout]$ hadoop jar /home/mahout/myMahout.jar
mahout in hadoop cluster running...
14/04/25 17:21:16 INFO common.AbstractJob: Command line arguments: {--booleanData=[true], --endPhase=[2147483647], --input=[hdfs://192.168.1.140:9000/user/hdfs/userCF], --maxPrefsPerUser=[10], --maxPrefsPerUserInItemSimilarity=[1000], --maxSimilaritiesPerItem=[100], --minPrefsPerUser=[1], --numRecommendations=[10], --output=[hdfs://192.168.1.140:9000/user/hdfs/userCF/result/], --similarityClassname=[org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.EuclideanDistanceSimilarity], --startPhase=[0], --tempDir=[hdfs://192.168.1.140:9000/tmp/1398417675558]}
14/04/25 17:21:16 INFO common.AbstractJob: Command line arguments: {--booleanData=[true], --endPhase=[2147483647], --input=[hdfs://192.168.1.140:9000/user/hdfs/userCF], --maxPrefsPerUser=[1000], --minPrefsPerUser=[1], --output=[hdfs://192.168.1.140:9000/tmp/1398417675558/preparePreferenceMatrix], --ratingShift=[0.0], --startPhase=[0], --tempDir=[hdfs://192.168.1.140:9000/tmp/1398417675558]}
14/04/25 17:21:17 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/04/25 17:21:17 INFO Configuration.deprecation: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
14/04/25 17:21:17 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:174)
at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:614)
at org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run(PreparePreferenceMatrixJob.java:75)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:158)
at org.conan.mymahout.recommendation.ItemCFHadoop.main(ItemCFHadoop.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Created 04-28-2014 10:16 PM
(If you encounter an error, you should state what the error is, but I assume it is "artifact not found" here.)
Add this to your pom.xml file at the end so that Maven knows to look in the Cloudera repo:
<repositories>
<repository>
<id>cloudera.repo</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
<name>Cloudera Repositories</name>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
Created 04-25-2014 06:46 AM
If I'm not mistaken, you have built your own application that embeds code from Mahout 0.8. That is not compatible with Hadoop 2, and CDH5 is based on Hadoop 2.3+. It also includes a distribution of Mahout 0.8 that has been modified to work on Hadoop 2.
To make this work, you can try to depend on the 0.8 distribution of Mahout from CDH 5 in your project instead, since it contains necessary modifications.
For example, if building with Maven, instead of specifying version 0.8 for artifact org.apache.mahout:mahout-core you would specify 0.8-cdh5.0.0 . You would also need to reference the Cloudera repo in your build.
You could also recompile Mahout locally for Hadoop 2, but I think this is the most trouble.
Created on 04-25-2014 06:27 PM - edited 04-28-2014 05:35 PM
Hi Sean, you are right. Thanks.
Created 04-28-2014 05:34 PM
Thanks.
And I want to know how to reference the Cloudera repo and specify mahout distribution to 0.8-cdh5.0.0. When i changed the version to 0.8-cdh5.0.0, the build occurred error.
Would you give a sample that using mahout version 0.8-cdh5.0.0?
Now my pom.xml as:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.xuesong.mymahout</groupId>
<artifactId>myMahout</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>myMahout</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<mahout.version>0.8</mahout.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-core</artifactId>
<version>${mahout.version}</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-integration</artifactId>
<version>${mahout.version}</version>
<exclusions>
<exclusion>
<groupId>org.mortbay.jetty</groupId>
<artifactId>jetty</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.cassandra</groupId>
<artifactId>cassandra-all</artifactId>
</exclusion>
<exclusion>
<groupId>me.prettyprint</groupId>
<artifactId>hector-core</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</project>
Created 04-28-2014 10:16 PM
(If you encounter an error, you should state what the error is, but I assume it is "artifact not found" here.)
Add this to your pom.xml file at the end so that Maven knows to look in the Cloudera repo:
<repositories>
<repository>
<id>cloudera.repo</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
<name>Cloudera Repositories</name>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
Created 04-28-2014 10:47 PM
You are so kind. Thanks for your help.