Running Mahout examples problem

zaher_mahdhi — Fri, 16 Sep 2022 15:40:34 GMT

I'm trying to use Mahout to do a clustering job, I've been struggling with it and maven for a week now ... My code works fine on eclipse on local machine but when i build it in a jar and send it to the cluster i get some errors reading from HDFS i guess.

First I created the directory under /user/root/testdata

hadoop fs -mkdir /user/root/testdata

then put the downloaded file synthetic_control.data into it

hadoop fs -put synthetic_control.data /user/root/testdata/

Finally run the example using:

-mahout examples jar from mahout 0.9 downloaded from website:

hadoop jar mahout-examples-1.0-SNAPSHOT-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

-and the mahout-examples-0.9.0.2.3.4.0-3485-job.jar file which is found in the mahout directory in the node:

hadoop jar /usr/hdp/2.3.4.0-3485/mahout/mahout-examples-0.9.0.2.3.4.0-3485-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

and in both cases i get this error :

WARNING: Use "yarn jar" to launch YARN applications.
16/03/04 11:57:03 INFO kmeans.Job: Running with default arguments
16/03/04 11:57:05 INFO common.HadoopUtil: Deleting output
16/03/04 11:57:05 INFO kmeans.Job: Preparing Input
16/03/04 11:57:05 INFO impl.TimelineClientImpl: Timeline service address: http://vm2.local:8188/ws/v1/timeline/
16/03/04 11:57:05 INFO client.RMProxy: Connecting to ResourceManager at vm1.local/10.10.10.1:8050
16/03/04 11:57:06 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/03/04 11:57:07 INFO input.FileInputFormat: Total input paths to process : 1
16/03/04 11:57:07 INFO mapreduce.JobSubmitter: number of splits:1
16/03/04 11:57:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456915204500_0029
16/03/04 11:57:07 INFO impl.YarnClientImpl: Submitted application application_1456915204500_0029
16/03/04 11:57:07 INFO mapreduce.Job: The url to track the job: http://vm1.local:8088/proxy/application_1456915204500_0029/
16/03/04 11:57:07 INFO mapreduce.Job: Running job: job_1456915204500_0029
16/03/04 11:57:14 INFO mapreduce.Job: Job job_1456915204500_0029 running in uber mode : false
16/03/04 11:57:14 INFO mapreduce.Job:  map 0% reduce 0%
16/03/04 11:57:20 INFO mapreduce.Job:  map 100% reduce 0%
16/03/04 11:57:20 INFO mapreduce.Job: Job job_1456915204500_0029 completed successfully
16/03/04 11:57:20 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=129757
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=288502
        HDFS: Number of bytes written=335470
        HDFS: Number of read operations=5
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=3457
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=3457
        Total vcore-seconds taken by all map tasks=3457
        Total megabyte-seconds taken by all map tasks=3539968
    Map-Reduce Framework
        Map input records=600
        Map output records=600
        Input split bytes=128
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=76
        CPU time spent (ms)=590
        Physical memory (bytes) snapshot=113729536
        Virtual memory (bytes) snapshot=2723696640
        Total committed heap usage (bytes)=62324736
    File Input Format Counters 
        Bytes Read=288374
    File Output Format Counters 
        Bytes Written=335470
16/03/04 11:57:20 INFO kmeans.Job: Running random seed to get initial clusters
16/03/04 11:57:20 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
16/03/04 11:57:20 INFO compress.CodecPool: Got brand-new compressor [.deflate]
16/03/04 11:57:21 INFO kmeans.RandomSeedGenerator: Wrote 6 Klusters to output/random-seeds/part-randomSeed
16/03/04 11:57:21 INFO kmeans.Job: Running KMeans with k = 6
16/03/04 11:57:21 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/random-seeds/part-randomSeed Out: output
16/03/04 11:57:21 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10
16/03/04 11:57:21 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
16/03/04 11:57:21 INFO impl.TimelineClientImpl: Timeline service address: http://vm2.local:8188/ws/v1/timeline/
16/03/04 11:57:21 INFO client.RMProxy: Connecting to ResourceManager at vm1.local/10.10.10.1:8050
16/03/04 11:57:21 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/03/04 11:57:22 INFO input.FileInputFormat: Total input paths to process : 1
16/03/04 11:57:22 INFO mapreduce.JobSubmitter: number of splits:1
16/03/04 11:57:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456915204500_0030
16/03/04 11:57:22 INFO impl.YarnClientImpl: Submitted application application_1456915204500_0030
16/03/04 11:57:22 INFO mapreduce.Job: The url to track the job: http://vm1.local:8088/proxy/application_1456915204500_0030/
16/03/04 11:57:22 INFO mapreduce.Job: Running job: job_1456915204500_0030
16/03/04 11:57:33 INFO mapreduce.Job: Job job_1456915204500_0030 running in uber mode : false
16/03/04 11:57:33 INFO mapreduce.Job:  map 0% reduce 0%
16/03/04 11:57:37 INFO mapreduce.Job: Task Id : attempt_1456915204500_0030_m_000000_0, Status : FAILED
Error: java.lang.IllegalStateException: output/clusters-0
    at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
    at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
    at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
    at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
    at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70)
    at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76)
    ... 10 more

16/03/04 11:57:42 INFO mapreduce.Job: Task Id : attempt_1456915204500_0030_m_000000_1, Status : FAILED
Error: java.lang.IllegalStateException: output/clusters-0
    at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
    at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
    at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
    at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
    at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70)
    at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76)
    ... 10 more

16/03/04 11:57:46 INFO mapreduce.Job: Task Id : attempt_1456915204500_0030_m_000000_2, Status : FAILED
Error: java.lang.IllegalStateException: output/clusters-0
    at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
    at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
    at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
    at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
    at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70)
    at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76)
    ... 10 more

16/03/04 11:57:52 INFO mapreduce.Job:  map 100% reduce 100%
16/03/04 11:57:53 INFO mapreduce.Job: Job job_1456915204500_0030 failed with state FAILED due to: Task failed task_1456915204500_0030_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

16/03/04 11:57:53 INFO mapreduce.Job: Counters: 13
    Job Counters 
        Failed map tasks=4
        Killed reduce tasks=1
        Launched map tasks=4
        Other local map tasks=3
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=11687
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=11687
        Total time spent by all reduce tasks (ms)=0
        Total vcore-seconds taken by all map tasks=11687
        Total vcore-seconds taken by all reduce tasks=0
        Total megabyte-seconds taken by all map tasks=11967488
        Total megabyte-seconds taken by all reduce tasks=0
Exception in thread "main" java.lang.InterruptedException: Cluster Iteration 1 failed processing output/clusters-1
    at org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:183)
    at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:224)
    at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:147)
    at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(Job.java:135)
    at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:60)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

I guess it's a version problem ...

Thanks.

Re: Running Mahout examples problem

nsabharwal — Sat, 05 Mar 2016 22:50:07 GMT

@Zaher Mahdhi

Please see this https://issues.apache.org/jira/browse/MAHOUT-1658

Fixed in 0.11

Re: Running Mahout examples problem

zaher_mahdhi — Mon, 07 Mar 2016 20:47:40 GMT

Resolved this ...

Instead of using relative path like this :

new Path("/testdata/points")

you have to put the absolute Path of the directory in your cluster:

new Path("hdfs://vm1.local:8020/user/root/testdata/points")

Re: Running Mahout examples problem

zaher_mahdhi — Mon, 07 Mar 2016 20:49:01 GMT

@Neeraj Sabharwal Even in 0.11 it still exist .. see my answer below.

question Running Mahout examples problem in Archives of Support Questions (Read Only)

Running Mahout examples problem

Re: Running Mahout examples problem

Re: Running Mahout examples problem

Re: Running Mahout examples problem