Reply
Highlighted
Explorer
Posts: 13
Registered: ‎10-28-2013
Accepted Solution

Re: NaN Error using arff.vector, canopy/kmeans and clusterdump

Hello Cloudera

 

I have an update on my NAN problem

 

I have discovered I can use mahout seqdumper to view the vectors written by the mahout arff.vector command to see whether or not it is actualy writing the vectors properly.

 

I checked all three files: iris.arff.mvc, seeds.arff.mvc and balance.arff.mvc using mahout seqdumper.

 

It turns out that in fact it was the mahout.arff.vector creating the NaN output which was transferred to my kmeans/canopy and clusterdump output;

 

Here we can see my seqdumper output for my seeds and iris datasets and my balance scale dataset (which works okay)

 

Masternode@Masterdatanode ~]$ mahout seqdumper -i /user/Masternode/seeds/seeds_data.arff.mvc > /tmp/seeds/dump.txt
16/08/26 01:52:49 WARN driver.MahoutDriver: No seqdumper.props found on classpath, will use command-line arguments only
16/08/26 01:52:50 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/user/Masternode/seeds/seeds_data.arff.mvc], --startPhase=[0], --tempDir=[temp]}
16/08/26 01:52:53 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
16/08/26 01:52:53 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
16/08/26 01:52:53 INFO driver.MahoutDriver: Program took 3827 ms (Minutes: 0.06378333333333333)

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB: /opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/mahout/mahout-examples-0.9-cdh5.6.0-job.jar
Input Path: /user/Masternode/seeds/seeds_data.arff.mvc
Key class: class org.apache.hadoop.io.LongWritable Value Class: class org.apache.mahout.math.VectorWritable
Key: 0: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:NaN,5:NaN,6:NaN,7:1.0}
Key: 1: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:NaN,5:NaN,6:NaN,7:1.0}
Key: 2: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:NaN,5:NaN,6:NaN,7:1.0}
Key: 3: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:NaN,5:NaN,6:NaN,7:1.0}
Key: 4: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:NaN,5:NaN,6:NaN,7:1.0}
:                                                       :
:                                                       :
Key: 205: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:NaN,5:NaN,6:NaN,7:3.0}
Key: 206: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:NaN,5:NaN,6:NaN,7:3.0}
Key: 207: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:NaN,5:NaN,6:NaN,7:3.0}
Key: 208: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:NaN,5:NaN,6:NaN,7:3.0}
Key: 209: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:NaN,5:NaN,6:NaN,7:3.0}
Count: 210

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

[Masternode@Masterdatanode ~]$ mahout seqdumper -i /user/Masternode/iris_data/kmeans3/iris.arff.mvc > /tmp/iris_data/dump.txt
16/08/26 03:52:28 WARN driver.MahoutDriver: No seqdumper.props found on classpath, will use command-line arguments only
16/08/26 03:52:29 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/user/Masternode/iris_data/kmeans3/iris.arff.mvc], --startPhase=[0], --tempDir=[temp]}
16/08/26 03:52:32 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
16/08/26 03:52:32 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
16/08/26 03:52:32 INFO driver.MahoutDriver: Program took 3746 ms (Minutes: 0.062433333333333334)

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB: /opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/mahout/mahout-examples-0.9-cdh5.6.0-job.jar
Input Path: /user/Masternode/iris_data/kmeans3/iris.arff.mvc
Key class: class org.apache.hadoop.io.LongWritable Value Class: class org.apache.mahout.math.VectorWritable
Key: 0: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:1.0}
Key: 1: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:1.0}
Key: 2: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:1.0}
Key: 3: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:1.0}
Key: 4: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:1.0}
:                                  :
:                                  :
Key: 145: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:3.0}
Key: 146: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:3.0}
Key: 147: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:3.0}
Key: 148: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:3.0}
Key: 149: Value: {0:NaN,1:NaN,2:NaN,3:NaN,4:3.0}
Count: 150

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

[Masternode@Masterdatanode ~]$ mahout seqdumper -i /user/Masternode/balance/balance.arff.mvc > /tmp/balance/dump.txt
16/08/26 01:58:33 WARN driver.MahoutDriver: No seqdumper.props found on classpath, will use command-line arguments only
16/08/26 01:58:34 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/user/Masternode/balance/balance.arff.mvc], --startPhase=[0], --tempDir=[temp]}
16/08/26 01:58:37 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
16/08/26 01:58:37 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
16/08/26 01:58:37 INFO driver.MahoutDriver: Program took 3889 ms (Minutes: 0.06481666666666666)

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxMAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB: /opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/mahout/mahout-examples-0.9-cdh5.6.0-job.jar
Input Path: /user/Masternode/balance/balance.arff.mvc
Key class: class org.apache.hadoop.io.LongWritable Value Class: class org.apache.mahout.math.VectorWritable
Key: 0: Value: {0:1.0,1:1.0,2:1.0,3:1.0,4:2.0}
Key: 1: Value: {0:1.0,1:1.0,2:1.0,3:2.0,4:3.0}
Key: 2: Value: {0:1.0,1:1.0,2:1.0,3:3.0,4:3.0}
Key: 3: Value: {0:1.0,1:1.0,2:1.0,3:4.0,4:3.0}
Key: 4: Value: {0:1.0,1:1.0,2:1.0,3:5.0,4:3.0}
:                                  :
:                                  :
Key: 620: Value: {0:5.0,1:5.0,2:5.0,3:1.0,4:1.0}
Key: 621: Value: {0:5.0,1:5.0,2:5.0,3:2.0,4:1.0}
Key: 622: Value: {0:5.0,1:5.0,2:5.0,3:3.0,4:1.0}
Key: 623: Value: {0:5.0,1:5.0,2:5.0,3:4.0,4:1.0}
Key: 624: Value: {0:5.0,1:5.0,2:5.0,3:5.0,4:2.0}
Count: 625

Here are the clusters for the balance scale dataset

 

[Masternode@Masterdatanode ~]$ mahout clusterdump -i /user/Masternode/balance/kmeans-out/clusters-1-final -o /tmp/balance/balance_clusters.txt -p /user/Masternode/balance/kmeans-out/clusteredPoints -dm org.apache.mahout.common.distance.TanimotoDistanceMeasure
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB: /opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/mahout/mahout-examples-0.9-cdh5.6.0-job.jar
16/08/22 23:05:40 WARN driver.MahoutDriver: No clusterdump.props found on classpath, will use command-line arguments only
16/08/22 23:05:40 INFO common.AbstractJob: Command line arguments: {--dictionaryType=[text], --distanceMeasure=[org.apache.mahout.common.distance.TanimotoDistanceMeasure], --endPhase=[2147483647], --input=[/user/Masternode/balance/kmeans-out/clusters-1-final], --output=[/tmp/balance/balance_clusters.txt], --outputFormat=[TEXT], --pointsDir=[/user/Masternode/balance/kmeans-out/clusteredPoints], --startPhase=[0], --tempDir=[temp]}
16/08/22 23:05:44 INFO clustering.ClusterDumper: Wrote 3 clusters
16/08/22 23:05:44 INFO driver.MahoutDriver: Program took 4136 ms (Minutes: 0.06893333333333333)

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
VL-410{n=213 c=[4.038, 2.131, 2.746, 2.446, 1.737] r=[0.968, 1.058, 1.361, 1.287, 0.917]} Weight : [props - optional]: Point: 1.0 : [distance=0.39346254907223044]: [2.000, 1.000, 1.000, 1.000, 1.000] 1.0 : [distance=0.29099665375325723]: [2.000, 1.000, 1.000, 2.000, 2.000] 1.0 : [distance=0.2737469670505256]: [2.000, 1.000, 2.000, 1.000, 2.000] 1.0 : [distance=0.2602217061100983]: [2.000, 1.000, 3.000, 1.000, 3.000] 1.0 : [distance=0.2703429967956935]: [2.000, 1.000, 4.000, 1.000, 3.000] : : VL-82{n=275 c=[1.975, 3.033, 2.669, 3.676, 2.415] r=[1.011, 1.379, 1.344, 1.242, 0.875]} Weight : [props - optional]: Point: 1.0 : [distance=0.4843955662442676]: [1.000, 1.000, 1.000, 1.000, 2.000] 1.0 : [distance=0.3310139832226746]: [1.000, 1.000, 1.000, 2.000, 3.000] 1.0 : [distance=0.25039239421781456]: [1.000, 1.000, 1.000, 3.000, 3.000] 1.0 : [distance=0.21916087567042697]: [1.000, 1.000, 1.000, 4.000, 3.000] 1.0 : [distance=0.23026798575608354]: [1.000, 1.000, 1.000, 5.000, 3.000] : : VL-370{n=140 c=[3.429, 4.271, 4.043, 2.486, 1.579] r=[1.283, 0.877, 1.095, 1.344, 0.854]} Weight : [props - optional]: Point: 1.0 : [distance=0.291345734798266]: [1.000, 2.000, 5.000, 1.000, 3.000] 1.0 : [distance=0.22857469129979302]: [1.000, 3.000, 4.000, 1.000, 3.000] 1.0 : [distance=0.22469106732898325]: [1.000, 3.000, 5.000, 1.000, 3.000] 1.0 : [distance=0.18798153075739144]: [1.000, 3.000, 5.000, 2.000, 3.000] 1.0 : [distance=0.27974934890340164]: [1.000, 4.000, 2.000, 1.000, 1.000] : :

Further on inspecting my balance.arff dataset.

 

I noticed that the file data were only integers seperated by commas ie

 

@relation balance-scale

@attribute left-weight numeric
@attribute left-distance numeric
@attribute right-weight numeric
@attribute right-distance numeric
@attribute class { L, B, R}

@data
1,1,1,1,B
1,1,1,2,R
1,1,1,3,R
1,1,1,4,R
1,1,1,5,R
1,1,2,1,R
1,1,2,2,R
1,1,2,3,R
1,1,2,4,R
1,1,2,5,R
:      :

Whereas my other datasets had doubles and float values as the data

 

ie for seeds.arff dataset and iris.arff datasets

 

@relation seeds

@attribute area numeric
@attribute perimeter numeric
@attribute compactness numeric
@attribute kernel-length numeric
@attribute kernel-width numeric
@attribute asymmetry numeric
@attribute kernel-groove numeric
@attribute class { 1, 2, 3}

@data

15.26,14.84,0.871,5.763,3.312,2.221,5.22,1
14.88,14.57,0.8811,5.554,3.333,1.018,4.956,1
14.29,14.09,0.905,5.291,3.337,2.699,4.825,1
13.84,13.94,0.8955,5.324,3.379,2.259,4.805,1
16.14,14.99,0.9034,5.658,3.562,1.355,5.175,1
14.38,14.21,0.8951,5.386,3.312,2.462,4.956,1
14.69,14.49,0.8799,5.563,3.259,3.586,5.219,1
14.11,14.1,0.8911,5.42,3.302,2.7,5,1
16.63,15.46,0.8747,6.053,3.465,2.04,5.877,1
16.44,15.25,0.888,5.884,3.505,1.969,5.533,1
:                                           :

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

@RELATION iris

@ATTRIBUTE sepallength	numeric
@ATTRIBUTE sepalwidth 	numeric
@ATTRIBUTE petallength 	numeric
@ATTRIBUTE petalwidth	numeric
@ATTRIBUTE class 	{Iris-setosa,Iris-versicolor,Iris-virginica}

@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
:                   :

So THIS is what is causing the problem for the mahout arff.vector command.

 

It does not seem to like these double and float input data.

 

Is there any solution to this ???????????????????

 

I am using Cloudera CDH5 Version 5.6.0-1.cdh5.6.0.p0.45

and Mahout Version 0.9+cdh5.6.0+26

 

 

ANY HELP MOST WELCOME !!!!!!!!!!!!!!!!!!!!!!!!!!

 

 

 

 

 

Explorer
Posts: 13
Registered: ‎10-28-2013

Re: NaN Error using arff.vector, canopy/kmeans and clusterdump

Hello Everybody !

 

I found the answer to this problem not too long after my last post.

 

Since mahout arff.vector command only likes integer input, transform your doubles and float data into integer values (whole numbers).

 

This is done by mulplying each column of data by 10 raised to the necesaary power.

Eg If you have data like 22.23 then multiply by 100, if you have data like 13.854 multiply by 1000.

NB Always multiply the whole dataset by the same number.

 

This multiplication by some factor just rescales the data without changing its characterstics.

 

Here is my rescaled iris data

 

 

@RELATION iris

@ATTRIBUTE sepallength	numeric
@ATTRIBUTE sepalwidth 	numeric
@ATTRIBUTE petallength 	numeric
@ATTRIBUTE petalwidth	numeric
@ATTRIBUTE class 	{Iris-setosa,Iris-versicolor,Iris-virginica}

@DATA

35,14,2,51,Iris-setosa
30,14,2,49,Iris-setosa
32,13,2,47,Iris-setosa
31,15,2,46,Iris-setosa
36,14,2,50,Iris-setosa
39,17,4,54,Iris-setosa
:                 :

And here my rescaled seeds data

 

@RELATION seeds

@ATTRIBUTE area	numeric
@ATTRIBUTE perimeter 	numeric
@ATTRIBUTE compactness	numeric
@ATTRIBUTE kernel_length numeric
@ATTRIBUTE kernel_width	numeric
@ATTRIBUTE asymmetry_coefficient numeric
@ATTRIBUTE kernel_groove numeric
@ATTRIBUTE class {1,2,3}

@DATA

14840,15260,5763,3312,2221,5220,871,1
14570,14880,5554,3333,1018,4956,881,1
14090,14290,5291,3337,2699,4825,905,1
13940,13840,5324,3379,2259,4805,896,1
14990,16140,5658,3562,1355,5175,903,1
14210,14380,5386,3312,2462,4956,895,1
;                                    :

 

After this mahout arff.vector works fine producing the required mahout seqdumper output:

 

For the iris data

 

Input Path: hdfs://childnode1:8020/user/Masternode/iris_data/kmeansout/clusteredPoints/part-m-00000
Key class: class org.apache.hadoop.io.IntWritable Value Class: class org.apache.mahout.clustering.classify.WeightedPropertyVectorWritable
Key: 4: Value: wt: 1.0 distance: 1.4694216549377501  vec: [35.000, 14.000, 2.000, 51.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 4.381689171997383  vec: [30.000, 14.000, 2.000, 49.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 4.123008610226189  vec: [32.000, 13.000, 2.000, 47.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 5.188371613521854  vec: [31.000, 15.000, 2.000, 46.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 1.9796969465046923  vec: [36.000, 14.000, 2.000, 50.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 6.838069903123147  vec: [39.000, 17.000, 4.000, 54.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 4.152011560677545  vec: [34.000, 14.000, 3.000, 46.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 0.5993329625508677  vec: [34.000, 15.000, 2.000, 50.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 8.009943820027594  vec: [29.000, 14.000, 2.000, 44.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 3.6659514453958333  vec: [31.000, 15.000, 1.000, 49.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 4.878442374364758  vec: [37.000, 15.000, 2.000, 54.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 2.513801901502982  vec: [34.000, 16.000, 2.000, 48.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 4.919268238264621  vec: [30.000, 14.000, 1.000, 48.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 9.090610540552246  vec: [30.000, 11.000, 1.000, 43.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 10.201921387660274  vec: [40.000, 12.000, 2.000, 58.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 12.130919173747719  vec: [44.000, 15.000, 4.000, 57.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 6.624137679728539  vec: [39.000, 13.000, 4.000, 54.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 1.5097019573412485  vec: [35.000, 14.000, 3.000, 51.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 8.284877790287531  vec: [38.000, 17.000, 3.000, 57.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 3.989887216451146  vec: [38.000, 15.000, 3.000, 51.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 4.6172719218168305  vec: [34.000, 17.000, 2.000, 54.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 3.3762701313727606  vec: [37.000, 15.000, 4.000, 51.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 6.443539400050163  vec: [36.000, 10.000, 2.000, 46.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 3.7946277814826366  vec: [33.000, 17.000, 5.000, 51.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 4.845534026296767  vec: [34.000, 19.000, 2.000, 48.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 4.4180538702010885  vec: [30.000, 16.000, 2.000, 50.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 2.078268510082371  vec: [34.000, 16.000, 4.000, 50.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 2.181559075523739  vec: [35.000, 15.000, 2.000, 52.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 2.097426995153822  vec: [34.000, 14.000, 2.000, 52.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 4.019850743497717  vec: [32.000, 16.000, 2.000, 47.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 4.049592572099055  vec: [31.000, 16.000, 2.000, 48.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 4.25666536152411  vec: [34.000, 15.000, 4.000, 54.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 7.244252894536452  vec: [41.000, 15.000, 1.000, 52.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 9.28219801555635  vec: [42.000, 14.000, 2.000, 55.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 3.6659514453958333  vec: [31.000, 15.000, 1.000, 49.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 3.4524194414930762  vec: [32.000, 12.000, 2.000, 50.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 5.287645979072288  vec: [35.000, 13.000, 2.000, 55.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 3.6659514453958333  vec: [31.000, 15.000, 1.000, 49.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 7.555077762670502  vec: [30.000, 13.000, 2.000, 44.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 1.1131936040060575  vec: [34.000, 15.000, 2.000, 51.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 1.9181240835774944  vec: [35.000, 13.000, 3.000, 50.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 12.39351443296044  vec: [23.000, 13.000, 3.000, 45.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 6.6602702647864795  vec: [32.000, 13.000, 2.000, 44.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 3.898615138738256  vec: [35.000, 16.000, 6.000, 50.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 6.076117181226863  vec: [38.000, 19.000, 4.000, 51.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 4.737003272111904  vec: [30.000, 14.000, 3.000, 48.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 4.185594342503789  vec: [38.000, 16.000, 2.000, 51.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 4.673242985336684  vec: [32.000, 14.000, 2.000, 46.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 4.113295515763298  vec: [37.000, 15.000, 2.000, 53.000, 1.000]
Key: 4: Value: wt: 1.0 distance: 1.413930691370691  vec: [33.000, 14.000, 2.000, 50.000, 1.000]
Key: 128: Value: wt: 1.0 distance: 12.27183013149531  vec: [32.000, 47.000, 14.000, 70.000, 2.000]
Key: 128: Value: wt: 1.0 distance: 6.845135447338132  vec: [32.000, 45.000, 15.000, 64.000, 2.000]
Key: 100: Value: wt: 1.0 distance: 10.234304921679705  vec: [31.000, 49.000, 15.000, 69.000, 2.000]
Key: 128: Value: wt: 1.0 distance: 7.318849411742204  vec: [23.000, 40.000, 13.000, 55.000, 2.000]
Key: 128: Value: wt: 1.0 distance: 6.3893365248584155  vec: [28.000, 46.000, 15.000, 65.000, 2.000]
Key: 128: Value: wt: 1.0 distance: 2.7032373546600645  vec: [28.000, 45.000, 13.000, 57.000, 2.000]
Key: 128: Value: wt: 1.0 distance: 7.648597295107581  vec: [33.000, 47.000, 16.000, 63.000, 2.000]
Key: 128: Value: wt: 1.0 distance: 15.840467020307052  vec: [24.000, 33.000, 10.000, 49.000, 2.000]
:                                                       :
: :
Count: 150

 

And for the seeds data

 

Input Path: hdfs://childnode1:8020/user/Masternode/seeds/kmeans_out/clusteredPoints/part-m-00000
Key class: class org.apache.hadoop.io.IntWritable Value Class: class org.apache.mahout.clustering.classify.WeightedPropertyVectorWritable
Key: 24: Value: wt: 1.0 distance: 861.9071668766143  vec: [14840.000, 15260.000, 5763.000, 3312.000, 2221.000, 5220.000, 871.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1668.6900610565315  vec: [14570.000, 14880.000, 5554.000, 3333.000, 1018.000, 4956.000, 881.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 694.0022557455687  vec: [14090.000, 14290.000, 5291.000, 3337.000, 2699.000, 4825.000, 905.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1137.750249826391  vec: [13940.000, 13840.000, 5324.000, 3379.000, 2259.000, 4805.000, 896.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 2066.340415883421  vec: [14990.000, 16140.000, 5658.000, 3562.000, 1355.000, 5175.000, 903.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 508.48008032866846  vec: [14210.000, 14380.000, 5386.000, 3312.000, 2462.000, 4956.000, 895.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 939.0255545463335  vec: [14490.000, 14690.000, 5563.000, 3259.000, 3586.000, 5219.000, 880.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 693.4040772257339  vec: [14100.000, 14110.000, 5420.000, 3302.000, 2700.000, 5000.000, 891.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 2457.5552311826623  vec: [15460.000, 16630.000, 6053.000, 3465.000, 2040.000, 5877.000, 875.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 2136.715539711948  vec: [15250.000, 16440.000, 5884.000, 3505.000, 1969.000, 5533.000, 888.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 2037.64908599264  vec: [14850.000, 15260.000, 5714.000, 3242.000, 4543.000, 5314.000, 870.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1183.0427661481044  vec: [14160.000, 14030.000, 5438.000, 3201.000, 1717.000, 5001.000, 880.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1668.9093470760495  vec: [14020.000, 13890.000, 5439.000, 3199.000, 3986.000, 4738.000, 888.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1129.8172629441638  vec: [14060.000, 13780.000, 5479.000, 3156.000, 3136.000, 4872.000, 876.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1114.6345778285634  vec: [14050.000, 13740.000, 5482.000, 3114.000, 2932.000, 4825.000, 874.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1616.574316222106  vec: [14280.000, 14590.000, 5351.000, 3333.000, 4185.000, 4781.000, 899.000, 1.000]
Key: 177: Value: wt: 1.0 distance: 2237.7216511574297  vec: [13830.000, 13990.000, 5119.000, 3383.000, 5234.000, 4781.000, 918.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1533.017981297028  vec: [14750.000, 15690.000, 5527.000, 3514.000, 1599.000, 5046.000, 906.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1141.7747846041293  vec: [14210.000, 14700.000, 5205.000, 3466.000, 1767.000, 4649.000, 915.000, 1.000]
Key: 177: Value: wt: 1.0 distance: 1073.5112523108057  vec: [13570.000, 12720.000, 5226.000, 3049.000, 4102.000, 4914.000, 869.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 673.1032757822736  vec: [14400.000, 14160.000, 5658.000, 3129.000, 3072.000, 5176.000, 858.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 588.5794743674002  vec: [14260.000, 14110.000, 5520.000, 3168.000, 2688.000, 5219.000, 872.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 2307.6221450474227  vec: [14900.000, 15880.000, 5618.000, 3507.000, 765.000, 5091.000, 899.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 3165.469289919653  vec: [13230.000, 12080.000, 5099.000, 2936.000, 1415.000, 4961.000, 866.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1022.3111169642691  vec: [14760.000, 15010.000, 5789.000, 3245.000, 1791.000, 5001.000, 866.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 2453.594725178613  vec: [15160.000, 16190.000, 5833.000, 3421.000, 903.000, 5307.000, 885.000, 1.000]
Key: 177: Value: wt: 1.0 distance: 1842.0639979312034  vec: [13760.000, 13020.000, 5395.000, 3026.000, 3373.000, 4825.000, 864.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 2127.2694725299393  vec: [13670.000, 12740.000, 5395.000, 2956.000, 2504.000, 4869.000, 856.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 638.121303238346  vec: [14180.000, 14110.000, 5541.000, 3221.000, 2754.000, 5038.000, 882.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1570.180781482017  vec: [14020.000, 13450.000, 5516.000, 3065.000, 3531.000, 5097.000, 860.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 2442.635062314189  vec: [13820.000, 13160.000, 5454.000, 2975.000, 855.000, 5056.000, 866.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1252.115173918792  vec: [14940.000, 15490.000, 5757.000, 3371.000, 3412.000, 5228.000, 872.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1405.0316200008115  vec: [14410.000, 14090.000, 5717.000, 3186.000, 3920.000, 5299.000, 853.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 954.5724423251518  vec: [14170.000, 13940.000, 5585.000, 3150.000, 2124.000, 5012.000, 873.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 729.6383182264915  vec: [14680.000, 15050.000, 5712.000, 3328.000, 2129.000, 5360.000, 878.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1655.6627215851186  vec: [15000.000, 16120.000, 5709.000, 3485.000, 2270.000, 5443.000, 1000.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1818.907545412335  vec: [15270.000, 16200.000, 5826.000, 3464.000, 2823.000, 5527.000, 873.000, 1.000]
Key: 76: Value: wt: 1.0 distance: 2107.009330735528  vec: [15380.000, 17080.000, 5832.000, 3683.000, 2956.000, 5484.000, 908.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 512.998557374996  vec: [14520.000, 14800.000, 5656.000, 3288.000, 3112.000, 5309.000, 882.000, 1.000]
Key: 177: Value: wt: 1.0 distance: 3176.186184965435  vec: [14170.000, 14280.000, 5397.000, 3298.000, 6685.000, 5001.000, 894.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1291.054421730103  vec: [13850.000, 13540.000, 5348.000, 3156.000, 2587.000, 5178.000, 887.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1382.5623224377814  vec: [13850.000, 13500.000, 5351.000, 3158.000, 2249.000, 5176.000, 885.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1853.3448494492047  vec: [13550.000, 13160.000, 5138.000, 3201.000, 2461.000, 4783.000, 901.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 2315.520560114591  vec: [14860.000, 15500.000, 5877.000, 3396.000, 4711.000, 5528.000, 882.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 695.316069521979  vec: [14540.000, 15110.000, 5579.000, 3462.000, 3128.000, 5180.000, 899.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1478.614814571006  vec: [14040.000, 13800.000, 5376.000, 3155.000, 1560.000, 4961.000, 879.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1508.3478794444047  vec: [14760.000, 15360.000, 5701.000, 3393.000, 1367.000, 5132.000, 886.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 481.97134871269895  vec: [14560.000, 14990.000, 5570.000, 3377.000, 2958.000, 5175.000, 888.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 183.71730361238392  vec: [14520.000, 14790.000, 5545.000, 3291.000, 2704.000, 5111.000, 882.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 630.6956414080009  vec: [14670.000, 14860.000, 5678.000, 3258.000, 2129.000, 5351.000, 868.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1346.3622667032168  vec: [14400.000, 14430.000, 5585.000, 3272.000, 3975.000, 5144.000, 875.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 3192.1534037702677  vec: [14910.000, 15780.000, 5674.000, 3434.000, 5593.000, 5136.000, 892.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1513.9463214768837  vec: [14610.000, 14490.000, 5715.000, 3113.000, 4116.000, 5396.000, 854.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 778.4078250448471  vec: [14280.000, 14330.000, 5504.000, 3199.000, 3328.000, 5224.000, 883.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1243.4189505293368  vec: [14600.000, 14520.000, 5741.000, 3113.000, 1481.000, 5487.000, 856.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 915.6820426338833  vec: [14770.000, 15030.000, 5702.000, 3212.000, 1933.000, 5439.000, 866.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 365.8725763036213  vec: [14350.000, 14460.000, 5388.000, 3377.000, 2802.000, 5044.000, 882.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1551.4815116461734  vec: [14430.000, 14920.000, 5384.000, 3412.000, 1142.000, 5088.000, 901.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1041.0816986203454  vec: [14770.000, 15380.000, 5662.000, 3419.000, 1999.000, 5222.000, 886.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 3069.1391786047893  vec: [13470.000, 12110.000, 5159.000, 3032.000, 1502.000, 4519.000, 839.000, 1.000]
Key: 177: Value: wt: 1.0 distance: 2234.4099731961946  vec: [12860.000, 11420.000, 5008.000, 2850.000, 2700.000, 4607.000, 868.000, 1.000]
Key: 177: Value: wt: 1.0 distance: 2723.181787028119  vec: [12630.000, 11230.000, 4902.000, 2879.000, 2269.000, 4703.000, 884.000, 1.000]
Key: 177: Value: wt: 1.0 distance: 1679.8632384057453  vec: [13190.000, 12360.000, 5076.000, 3042.000, 3220.000, 4605.000, 892.000, 1.000]
Key: 177: Value: wt: 1.0 distance: 1525.05309032792  vec: [13840.000, 13220.000, 5395.000, 3070.000, 4157.000, 5088.000, 868.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 2603.1739906169455  vec: [13570.000, 12780.000, 5262.000, 3026.000, 1176.000, 4782.000, 872.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 2164.8100360288104  vec: [13500.000, 12880.000, 5139.000, 3119.000, 2352.000, 4607.000, 888.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1379.1305505369796  vec: [14370.000, 14340.000, 5630.000, 3190.000, 1313.000, 5150.000, 873.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 802.2824508737174  vec: [14290.000, 14010.000, 5609.000, 3158.000, 2217.000, 5132.000, 862.000, 1.000]
Key: 24: Value: wt: 1.0 distance: 1230.3844556714478  vec: [14390.000, 14370.000, 5569.000, 3153.000, 1464.000, 5300.000, 873.000, 1.000]
Key: 177: Value: wt: 1.0 distance: 1533.2295745642984  vec: [13750.000, 12730.000, 5412.000, 2882.000, 3533.000, 5067.000, 846.000, 1.000]
Key: 76: Value: wt: 1.0 distance: 1242.0782987132768  vec: [15980.000, 17630.000, 6191.000, 3561.000, 4076.000, 6060.000, 867.000, 2.000]
Key: 76: Value: wt: 1.0 distance: 2284.8314552331644  vec: [15670.000, 16840.000, 5998.000, 3484.000, 4675.000, 5877.000, 862.000, 2.000]
Key: 76: Value: wt: 1.0 distance: 1865.3220945799494  vec: [15730.000, 17260.000, 5978.000, 3594.000, 4539.000, 5791.000, 876.000, 2.000]
Key: 76: Value: wt: 1.0 distance: 802.7846465017649  vec: [16260.000, 19110.000, 6154.000, 3930.000, 2936.000, 6079.000, 908.000, 2.000]
Key: 76: Value: wt: 1.0 distance: 2130.8931341845782  vec: [15510.000, 16820.000, 6017.000, 3486.000, 4004.000, 5841.000, 879.000, 2.000]
Key: 76: Value: wt: 1.0 distance: 2497.1535713990975  vec: [15620.000, 16770.000, 5927.000, 3438.000, 4920.000, 5795.000, 864.000, 2.000]
Key: 76: Value: wt: 1.0 distance: 1519.304200689462  vec: [15910.000, 17320.000, 6064.000, 3403.000, 3824.000, 5922.000, 860.000, 2.000]
Key: 76: Value: wt: 1.0 distance: 2415.437198684845  vec: [17230.000, 20710.000, 6579.000, 3814.000, 4451.000, 6451.000, 876.000, 2.000]
Key: 76: Value: wt: 1.0 distance: 1538.7977024244756  vec: [16490.000, 18940.000, 6445.000, 3639.000, 5064.000, 6362.000, 875.000, 2.000]
Key: 76: Value: wt: 1.0 distance: 1983.9629595142953  vec: [15550.000, 17120.000, 5850.000, 3566.000, 2858.000, 5746.000, 889.000, 2.000]
:                                           :
: :
Count: 210

And for the mahout clusterdump output, ie the clusters:

 

For the iris data

 

VL-128{n=62 c=[27.484, 43.935, 14.339, 59.016, 2.226] r=[2.939, 5.048, 2.951, 4.626, 0.418]}
	Weight : [props - optional]:  Point:
	1.0 : [distance=12.27183013149531]: [32.000, 47.000, 14.000, 70.000, 2.000]
	1.0 : [distance=6.845135447338132]: [32.000, 45.000, 15.000, 64.000, 2.000]
	1.0 : [distance=7.318849411742204]: [23.000, 40.000, 13.000, 55.000, 2.000]
	:                                :
        :                                :
VL-4{n=50 c=[34.180, 14.640, 2.440, 50.060, 1.000] r=[0:3.772, 1:1.718, 2:1.061, 3:3.489]}
	Weight : [props - optional]:  Point:
	1.0 : [distance=1.4694216549377501]: [35.000, 14.000, 2.000, 51.000, 1.000]
	1.0 : [distance=4.381689171997383]: [30.000, 14.000, 2.000, 49.000, 1.000]
	1.0 : [distance=4.123008610226189]: [32.000, 13.000, 2.000, 47.000, 1.000]
        :                                 :
        :                                 :
VL-100{n=38 c=[30.737, 57.421, 20.711, 68.500, 2.947] r=[2.862, 4.821, 2.762, 4.876, 0.223]}
	Weight : [props - optional]:  Point:
	1.0 : [distance=10.234304921679705]: [31.000, 49.000, 15.000, 69.000, 2.000]
	1.0 : [distance=8.516482308683988]: [30.000, 50.000, 17.000, 67.000, 2.000]
	1.0 : [distance=7.773365278708711]: [33.000, 60.000, 25.000, 63.000, 3.000]
        :                                  :
        :                                  :

And for the seeds data

VL-76{n=61 c=[16297.377, 18721.803, 6208.934, 3722.672, 3603.590, 6066.098, 885.115, 1.984] r=[470.329, 1087.056, 218.340, 150.079, 1222.928, 222.042, 14.901, 0.127]}
	Weight : [props - optional]:  Point:
	1.0 : [distance=2107.009330735528]: [15380.000, 17080.000, 5832.000, 3683.000, 2956.000, 5484.000, 908.000, 1.000]
	1.0 : [distance=1242.0782987132768]: [15980.000, 17630.000, 6191.000, 3561.000, 4076.000, 6060.000, 867.000, 2.000]
	1.0 : [distance=2284.8314552331644]: [15670.000, 16840.000, 5998.000, 3484.000, 4675.000, 5877.000, 862.000, 2.000]
        :                                   :
        :                                   :
VL-177{n=77 c=[13274.805, 11964.416, 5229.286, 2872.922, 4759.740, 5088.519, 852.208, 2.766] r=[370.481, 808.956, 141.698, 162.027, 1292.441, 182.253, 22.964, 0.643]}
	Weight : [props - optional]:  Point:
	1.0 : [distance=2237.7216511574297]: [13830.000, 13990.000, 5119.000, 3383.000, 5234.000, 4781.000, 918.000, 1.000]
	1.0 : [distance=1073.5112523108057]: [13570.000, 12720.000, 5226.000, 3049.000, 4102.000, 4914.000, 869.000, 1.000]
	1.0 : [distance=1842.0639979312034]: [13760.000, 13020.000, 5395.000, 3026.000, 3373.000, 4825.000, 864.000, 1.000]
        :                                   :
        :                                   :
VL-24{n=72 c=[14460.417, 14648.472, 5563.778, 3277.903, 2648.931, 5192.319, 880.556, 1.194] r=[531.885, 1108.561, 218.877, 158.321, 1093.229, 318.248, 21.182, 0.461]}
	Weight : [props - optional]:  Point:
	1.0 : [distance=861.9071668766143]: [14840.000, 15260.000, 5763.000, 3312.000, 2221.000, 5220.000, 871.000, 1.000]
	1.0 : [distance=1668.6900610565315]: [14570.000, 14880.000, 5554.000, 3333.000, 1018.000, 4956.000, 881.000, 1.000]
	1.0 : [distance=694.0022557455687]: [14090.000, 14290.000, 5291.000, 3337.000, 2699.000, 4825.000, 905.000, 1.000]
        :                                   :
        :                                   :

So the mahout arff.vector command works fine. Always !!!!

 

Sorry for the delay in replying !

Announcements