<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to print data after canopy clustering in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12562#M17983</link>
    <description>&lt;P&gt;OK, is the file nonempty? I think the data is not in the format you expect then. From skimming the code, it looks like the output is Text + ClusterWritable, not IntWritable +&amp;nbsp;&lt;SPAN&gt;WeightedPropertyVectorWritable. &amp;nbsp;You are trying to print the cluster centroids, right?&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 21 May 2014 14:46:42 GMT</pubDate>
    <dc:creator>srowen</dc:creator>
    <dc:date>2014-05-21T14:46:42Z</dc:date>
    <item>
      <title>How to print data after canopy clustering</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12534#M17976</link>
      <description>&lt;P&gt;Hi Experts,&lt;BR /&gt;&lt;BR /&gt;Here you can find simple piece of code which I wrote:&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;import java.io.BufferedReader; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;import java.io.FileReader; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;import java.io.IOException; &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;import org.apache.hadoop.conf.Configuration; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;import org.apache.hadoop.fs.FileSystem; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;import org.apache.hadoop.fs.Path; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;import org.apache.hadoop.io.IntWritable; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;import org.apache.hadoop.io.LongWritable; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;import org.apache.hadoop.io.SequenceFile; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;import org.apache.mahout.clustering.Cluster; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;import org.apache.mahout.clustering.canopy.CanopyDriver; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;import org.apache.mahout.clustering.classify.WeightedPropertyVectorWritable; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;import org.apache.mahout.common.distance.EuclideanDistanceMeasure; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;import org.apache.mahout.math.RandomAccessSparseVector; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;import org.apache.mahout.math.Vector; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;import org.apache.mahout.math.VectorWritable; &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;public class Clustering { &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;private final static String root = "C:&lt;SPAN&gt;&lt;A target="_blank"&gt;\\root\\BI\\"&lt;/A&gt;&lt;/SPAN&gt;; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;private final static String dataDir = root + "synthetic_control.data"; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;private final static String seqDir = root + "synthetic_control.seq"; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;private final static String outputDir = root + "output"; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;private final static String partMDir = outputDir + "\\" &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;+ Cluster.CLUSTERED_POINTS_DIR + "&lt;SPAN&gt;&lt;A target="_blank"&gt;\\part-m-0"&lt;/A&gt;&lt;/SPAN&gt;; &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;private final static String SEPARATOR = " "; &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;private final static int NUMBER_OF_ELEMENTS = 2; &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;private Configuration conf; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;private FileSystem fs; &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;public Clustering() throws IOException { &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;conf = new Configuration(); &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;fs = FileSystem.get(conf); &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;} &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;public void convertToVectorFile() throws IOException { &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;BufferedReader reader = new BufferedReader(new FileReader(dataDir)); &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;new Path(seqDir), LongWritable.class, VectorWritable.class); &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;String line; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;long counter = 0; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;while ((line = reader.readLine()) != null) { &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;String[] c; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;c = line.split(SEPARATOR); &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;double[] d = new double[c.length]; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;for (int i = 0; i &amp;lt; NUMBER_OF_ELEMENTS; i++) { &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;try { &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;d[i] = Double.parseDouble(c[i]); &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;} catch (Exception ex) { &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;d[i] = 0; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;} &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;} &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;Vector vec = new RandomAccessSparseVector(c.length); &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;vec.assign(d); &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;VectorWritable writable = new VectorWritable(); &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;writable.set(vec); &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;writer.append(new LongWritable(counter++), writable); &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;} &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;writer.close(); &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;} &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;public void createClusters(double t1, double t2, &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;double clusterClassificationThreshold, boolean runSequential) &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;throws ClassNotFoundException, IOException, InterruptedException { &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;EuclideanDistanceMeasure measure = new EuclideanDistanceMeasure(); &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;Path inputPath = new Path(seqDir); &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;Path outputPath = new Path(outputDir); &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;CanopyDriver.run(inputPath, outputPath, measure, t1, t2, runSequential, &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;clusterClassificationThreshold, runSequential); &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;} &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;public void printClusters() throws IOException { &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;SequenceFile.Reader readerSequence = new SequenceFile.Reader(fs, &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;new Path(partMDir), conf); &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;IntWritable key = new IntWritable(); &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;WeightedPropertyVectorWritable value = new WeightedPropertyVectorWritable(); &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;while (readerSequence.next(key, value)) { &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;System.out.println(value.toString() + " belongs to cluster " &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;+ key.toString()); &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;} &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;readerSequence.close(); &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;} &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;} &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;Here we have got 3 different methods.&lt;BR /&gt;&lt;BR /&gt;A. convertToVectorFile()&lt;BR /&gt;&lt;BR /&gt;This function takes a file C:\root\BI\synthetic_control.data and converts it into another file (I was following book Mahout in Action ).&lt;BR /&gt;&lt;BR /&gt;For file:&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;0.01 1.0 &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;0.1 0.9 &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;0.1 0.95 &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;12.0 13.0 &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;12.5 12.8&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;it generated for me the following structure:&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;gt;tree /F &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;C:. &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;.synthetic_control.seq.crc &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;synthetic_control.data &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;synthetic_control.seq&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;with log in Eclipse:&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG Groups - Creating new Groups object &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG Groups - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000 &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG UserGroupInformation - hadoop login &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG UserGroupInformation - hadoop login commit &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG UserGroupInformation - using local user:NTUserPrincipal : xxxxxxxx &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG UserGroupInformation - UGI loginUserxxxxxxx &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG FileSystem - Creating filesystem for &lt;SPAN&gt;&lt;A target="_blank"&gt;file:///&lt;/A&gt;&lt;/SPAN&gt; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG NativeCodeLoader - Trying to load the custom-built native-hadoop library... &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG NativeCodeLoader - Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG NativeCodeLoader - java.library.path=C:\Program Files\Java\jre7\bin;C:\Windows\Sun\Java\bin;C:\Windows\system32;C:\Windows;C:\Program Files (x86)\Intel\iCLS Client\;C:\Program Files\Intel\iCLS Client\;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Intel\OpenCL SDK\2.0\bin\x86;C:\Program Files (x86)\Intel\OpenCL SDK\2.0\bin\x64;C:\Program Files\Intel\Intel(R) Management Engine Components\DAL;C:\Program Files\Intel\Intel(R) Management Engine Components\IPT;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\DAL;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\IPT;C:\Program Files\MATLAB\R2009b\runtime\win64;C:\Program Files\MATLAB\R2009b\bin;C:\Program Files\TortoiseSVN\bin;C:\Users\xxxxxxxx\Documents\apache-maven-3.1.1\bin;. &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable &lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;B. createClusters()&lt;BR /&gt;&lt;BR /&gt;Next method generates clusters. When I run it it gives me a log:&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;INFO CanopyDriver - Build Clusters Input: C:/Users/xxxxxxxx/Desktop/BI/synthetic_control.seq Out: C:/Users/xxxxxxxx/Desktop/BI/output Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure@2224ece4 t1: 2.0 t2: 3.0 &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG CanopyClusterer - Created new Canopy:0 at center:[&lt;SPAN&gt;&lt;A target="_blank" href="callto:0.010,%201.000"&gt;0.010, 1.000&lt;/A&gt;&lt;/SPAN&gt;] &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG CanopyClusterer - Added point: [&lt;SPAN&gt;&lt;A target="_blank" href="callto:0.100,%200.900"&gt;0.100, 0.900&lt;/A&gt;&lt;/SPAN&gt;] to canopy: C-0 &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG CanopyClusterer - Added point: [&lt;SPAN&gt;&lt;A target="_blank" href="callto:0.100,%200.950"&gt;0.100, 0.950&lt;/A&gt;&lt;/SPAN&gt;] to canopy: C-0 &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG CanopyClusterer - Created new Canopy:1 at center:[&lt;SPAN&gt;&lt;A target="_blank" href="callto:12.000,%2013.000"&gt;12.000, 13.000&lt;/A&gt;&lt;/SPAN&gt;] &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG CanopyClusterer - Added point: [&lt;SPAN&gt;&lt;A target="_blank" href="callto:12.500,%2012.800"&gt;12.500, 12.800&lt;/A&gt;&lt;/SPAN&gt;] to canopy: C-1 &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG CanopyDriver - Writing Canopy:C-0 center:[&lt;SPAN&gt;&lt;A target="_blank" href="callto:0.070,%200.950"&gt;0.070, 0.950&lt;/A&gt;&lt;/SPAN&gt;] numPoints:3 radius:[&lt;SPAN&gt;&lt;A target="_blank" href="callto:0.042,%200.041"&gt;0.042, 0.041&lt;/A&gt;&lt;/SPAN&gt;] &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG CanopyDriver - Writing Canopy:C-1 center:[&lt;SPAN&gt;&lt;A target="_blank" href="callto:12.250,%2012.900"&gt;12.250, 12.900&lt;/A&gt;&lt;/SPAN&gt;] numPoints:2 radius:[&lt;SPAN&gt;&lt;A target="_blank" href="callto:0.250,%200.100"&gt;0.250, 0.100&lt;/A&gt;&lt;/SPAN&gt;] &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG FileSystem - Starting clear of FileSystem cache with 1 elements. &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG FileSystem - Removing filesystem for &lt;SPAN&gt;&lt;A target="_blank"&gt;file:///&lt;/A&gt;&lt;/SPAN&gt; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG FileSystem - Removing filesystem for &lt;SPAN&gt;&lt;A target="_blank"&gt;file:///&lt;/A&gt;&lt;/SPAN&gt; &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;DEBUG FileSystem - Done clearing cache&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;and I can see more files in my directory:&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;gt;tree /F &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;C:. &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;│ .synthetic_control.seq.crc &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;│ synthetic_control.data &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;│ synthetic_control.seq &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;│ &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;└───output &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;├───clusteredPoints &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;│ .part-m-0.crc &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;│ part-m-0 &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;│ &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;└───clusters-0-final &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;.part-r-00000.crc &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;._policy.crc &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;part-r-00000 &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;_policy&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;Reading the log we can see that everything worked well. We have got 2 clusters with proper points.&lt;BR /&gt;&lt;BR /&gt;C. printClusters()&lt;BR /&gt;&lt;BR /&gt;Here is my problem.&lt;BR /&gt;&lt;BR /&gt;I have no erros but I cannot see any results in console screen. My code never goes in while loop.&lt;BR /&gt;&lt;BR /&gt;Thank you for any help&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 08:59:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12534#M17976</guid>
      <dc:creator>mahoutmaster</dc:creator>
      <dc:date>2022-09-16T08:59:10Z</dc:date>
    </item>
    <item>
      <title>Re: How to print data after canopy clustering</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12542#M17977</link>
      <description>&lt;P&gt;Do the files have data in them? I would double-check that they are not 0-length, but I doubt it. What directory do you find the files in? I suspect its name is like "part-m-00000" but your code appears to be listing "part-m-0"&lt;/P&gt;</description>
      <pubDate>Wed, 21 May 2014 11:27:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12542#M17977</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-05-21T11:27:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to print data after canopy clustering</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12546#M17978</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In fact, I wax expecting file named &lt;STRONG&gt;part-m-00000&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Before I run my program, only file &lt;STRONG&gt;C:\root\BI\synthetic_control.data &lt;/STRONG&gt;exists with data&amp;amp;colon;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;0.01 1.0
0.1 0.9
0.1 0.95
12.0 13.0
12.5 12.8&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;when I run method &lt;STRONG&gt;convertToVectorFile()&lt;/STRONG&gt; I can see 2 new files:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;│&amp;nbsp;&amp;nbsp; &lt;STRONG&gt;.synthetic_control.seq.crc&lt;/STRONG&gt;&lt;BR /&gt;│&amp;nbsp;&amp;nbsp; synthetic_control.data&lt;BR /&gt;│&amp;nbsp;&amp;nbsp; &lt;STRONG&gt;synthetic_control.seq&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;BR /&gt;&lt;/STRONG&gt;when I run method&amp;nbsp;&lt;STRONG&gt;createClusters()&lt;/STRONG&gt; I can see few new files:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;BR /&gt;&lt;/STRONG&gt;│&amp;nbsp;&amp;nbsp; .synthetic_control.seq.crc&lt;BR /&gt;│&amp;nbsp;&amp;nbsp; synthetic_control.data&lt;BR /&gt;│&amp;nbsp;&amp;nbsp; synthetic_control.seq&lt;BR /&gt;│&lt;BR /&gt;└───&lt;STRONG&gt;output&lt;/STRONG&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ├───&lt;STRONG&gt;clusteredPoints&lt;/STRONG&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; │&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;STRONG&gt;.part-m-0.crc&lt;/STRONG&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; │&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;STRONG&gt;part-m-0&lt;/STRONG&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; │&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; └───&lt;STRONG&gt;clusters-0-final&lt;/STRONG&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;STRONG&gt;.part-r-00000.crc&lt;/STRONG&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;STRONG&gt;._policy.crc&lt;/STRONG&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;STRONG&gt;part-r-00000&lt;/STRONG&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;STRONG&gt;_policy&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Because there is a lot of strange characters I uploaded these files here: &lt;A target="_self" href="http://www.sendspace.com/filegroup/3GI05CYBYSuTocKCosPV82%2Bnia%2BMIStv3xcIHtfpzWCxwXqnG0V8Is%2BP6MbTT2QE"&gt;All files&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;File &lt;STRONG&gt;part-m-00000&lt;/STRONG&gt; does not exist...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for your help&lt;/P&gt;</description>
      <pubDate>Wed, 21 May 2014 12:10:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12546#M17978</guid>
      <dc:creator>mahoutmaster</dc:creator>
      <dc:date>2014-05-21T12:10:09Z</dc:date>
    </item>
    <item>
      <title>Re: How to print data after canopy clustering</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12548#M17979</link>
      <description>&lt;P&gt;Yeah that's why I'm confused here. Is it not hte part-r-00000 that likely has the data?&lt;/P&gt;&lt;P&gt;The format is a binary serialization and you can't open it as if it is a text file.&lt;/P&gt;</description>
      <pubDate>Wed, 21 May 2014 13:13:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12548#M17979</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-05-21T13:13:23Z</dc:date>
    </item>
    <item>
      <title>Re: How to print data after canopy clustering</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12550#M17980</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I changed this line:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;private final static String partMDir = outputDir + "\\"
+ Cluster.CLUSTERED_POINTS_DIR + "\\part-m-0"; &lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;and now I have got:&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;PRE&gt;private final static String partMDir = outputDir + "\\" + "clusters-0-final" + "\\part-r-00000";&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;When I run my code I have got an exception:&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;PRE&gt;java.io.IOException: wrong value class: wt: 0.0  vec: null is not class org.apache.mahout.clustering.iterator.ClusterWritable
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1936)
	at com.my.packagee.bi.canopy.CanopyClustering.printClusters(CanopyClustering.java:129)
	at com.my.package.bi.BIManager.printClusters(BIManager.java:20)
	at com.my.package.bi.Main.main(Main.java:15)&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;which goes from line:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;while (readerSequence.next(key, value)) {&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;I changed a little bit pom.xml file maybe there is some problems, f.e. with version.&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;PRE&gt;                &amp;lt;mahout.version&amp;gt;0.9&amp;lt;/mahout.version&amp;gt;
                
                &amp;lt;mahout.groupid&amp;gt;org.apache.mahout&amp;lt;/mahout.groupid&amp;gt;

                &amp;lt;dependency&amp;gt;
                        &amp;lt;groupId&amp;gt;${mahout.groupid}&amp;lt;/groupId&amp;gt;
                        &amp;lt;artifactId&amp;gt;mahout-core&amp;lt;/artifactId&amp;gt;
                        &amp;lt;version&amp;gt;${mahout.version}&amp;lt;/version&amp;gt;
                &amp;lt;/dependency&amp;gt;
                &amp;lt;dependency&amp;gt;
                        &amp;lt;groupId&amp;gt;${mahout.groupid}&amp;lt;/groupId&amp;gt;
                        &amp;lt;artifactId&amp;gt;mahout-core&amp;lt;/artifactId&amp;gt;
                        &amp;lt;type&amp;gt;test-jar&amp;lt;/type&amp;gt;
                        &amp;lt;scope&amp;gt;test&amp;lt;/scope&amp;gt;
                        &amp;lt;version&amp;gt;${mahout.version}&amp;lt;/version&amp;gt;
                &amp;lt;/dependency&amp;gt;
                &amp;lt;dependency&amp;gt;
                        &amp;lt;groupId&amp;gt;${mahout.groupid}&amp;lt;/groupId&amp;gt;
                        &amp;lt;artifactId&amp;gt;mahout-math&amp;lt;/artifactId&amp;gt;
                        &amp;lt;version&amp;gt;${mahout.version}&amp;lt;/version&amp;gt;
                &amp;lt;/dependency&amp;gt;
                &amp;lt;dependency&amp;gt;
                        &amp;lt;groupId&amp;gt;${mahout.groupid}&amp;lt;/groupId&amp;gt;
                        &amp;lt;artifactId&amp;gt;mahout-math&amp;lt;/artifactId&amp;gt;
                        &amp;lt;type&amp;gt;test-jar&amp;lt;/type&amp;gt;
                        &amp;lt;scope&amp;gt;test&amp;lt;/scope&amp;gt;
                        &amp;lt;version&amp;gt;${mahout.version}&amp;lt;/version&amp;gt;
                &amp;lt;/dependency&amp;gt;
                &amp;lt;dependency&amp;gt;
                        &amp;lt;groupId&amp;gt;${mahout.groupid}&amp;lt;/groupId&amp;gt;
                        &amp;lt;artifactId&amp;gt;mahout-examples&amp;lt;/artifactId&amp;gt;
                        &amp;lt;version&amp;gt;${mahout.version}&amp;lt;/version&amp;gt;
                &amp;lt;/dependency&amp;gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you in advance&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 21 May 2014 13:33:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12550#M17980</guid>
      <dc:creator>mahoutmaster</dc:creator>
      <dc:date>2014-05-21T13:33:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to print data after canopy clustering</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12554#M17981</link>
      <description>&lt;P&gt;I think you have found the right file then, but it is saying that it did not generate cluster centers. Maybe the data is too small. This might be better as a question on the Mahout mailing list as to what that means.&lt;/P&gt;</description>
      <pubDate>Wed, 21 May 2014 14:06:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12554#M17981</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-05-21T14:06:26Z</dc:date>
    </item>
    <item>
      <title>Re: How to print data after canopy clustering</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12556#M17982</link>
      <description>&lt;P&gt;Thank you for your message.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am not sure if you are right...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here you can see full log:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;DEBUG CanopyClusterer - Created new Canopy:0 at center:[0.010, 1.000]
DEBUG CanopyClusterer - Added point: [0.100, 0.900] to canopy: C-0
DEBUG CanopyClusterer - Added point: [0.100, 0.950] to canopy: C-0
DEBUG CanopyClusterer - Created new Canopy:1 at center:[12.000, 13.000]
DEBUG CanopyClusterer - Added point: [12.500, 12.800] to canopy: C-1
DEBUG CanopyDriver - Writing Canopy:C-0 center:[0.070, 0.950] numPoints:3 radius:[0.042, 0.041]
DEBUG CanopyDriver - Writing Canopy:C-1 center:[12.250, 12.900] numPoints:2 radius:[0.250, 0.100]&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;So it seems to be ok...&lt;/P&gt;</description>
      <pubDate>Wed, 21 May 2014 14:15:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12556#M17982</guid>
      <dc:creator>mahoutmaster</dc:creator>
      <dc:date>2014-05-21T14:15:30Z</dc:date>
    </item>
    <item>
      <title>Re: How to print data after canopy clustering</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12562#M17983</link>
      <description>&lt;P&gt;OK, is the file nonempty? I think the data is not in the format you expect then. From skimming the code, it looks like the output is Text + ClusterWritable, not IntWritable +&amp;nbsp;&lt;SPAN&gt;WeightedPropertyVectorWritable. &amp;nbsp;You are trying to print the cluster centroids, right?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 21 May 2014 14:46:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12562#M17983</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-05-21T14:46:42Z</dc:date>
    </item>
    <item>
      <title>Re: How to print data after canopy clustering</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12564#M17984</link>
      <description>&lt;P&gt;Thank you for your effort.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;No, this file is not empty: here you can check it &lt;A target="_self" href="http://www.sendspace.com/file/9o6lgz"&gt;part-r-00000&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would like to see all the vectors with information about cluster for each.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It would be nice to see also centers of the clusters.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I changed&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;IntWritable key = new IntWritable();
WeightedPropertyVectorWritable value = new WeightedPropertyVectorWritable();&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;to this&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;Text key = new Text();
ClusterWritable value = new ClusterWritable();&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;I have not got any exception but the oputput is:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;org.apache.mahout.clustering.iterator.ClusterWritable@572c4a12 belongs to cluster C-0
org.apache.mahout.clustering.iterator.ClusterWritable@572c4a12 belongs to cluster C-1&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;---&lt;/P&gt;&lt;P&gt;EDIT:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I changed&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;value.toString()&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;to&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;value.getValue()&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;and now, I have got an output:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;C-0: {0:0.07,1:0.9499999999999998} belongs to cluster C-0
C-1: {0:12.25,1:12.9} belongs to cluster C-1&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you very much !!!!&lt;/P&gt;</description>
      <pubDate>Wed, 21 May 2014 14:59:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-print-data-after-canopy-clustering/m-p/12564#M17984</guid>
      <dc:creator>mahoutmaster</dc:creator>
      <dc:date>2014-05-21T14:59:48Z</dc:date>
    </item>
  </channel>
</rss>

