Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to list all vectors from a cluster

How to list all vectors from a cluster

New Contributor

Hi,

 

I have got a piece of code which creates for me few clusters with vectors.

 

When I run it, I can see a log which says that 2 clusters have been created with 2 central points:

 

INFO  CanopyDriver - Build Clusters Input: C:/Users/xxxxxxx/Documents/jboss-as-7.1.1.Final/jboss-as-7.1.1.Final/bin/BI/synthetic_control.seq Out: C:/Users/xxxxxxx/Documents/jboss-as-7.1.1.Final/jboss-as-7.1.1.Final/bin/BI/output Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure@7faf9b87 t1: 5.0 t2: 9.0
DEBUG CanopyClusterer - Created new Canopy:0 at center:[0.100, 1.000]
DEBUG CanopyClusterer - Added point: [0.100, 0.900] to canopy: C-0
DEBUG CanopyClusterer - Added point: [0.100, 0.950] to canopy: C-0
DEBUG CanopyClusterer - Created new Canopy:1 at center:[12.300, 12.400]
DEBUG CanopyClusterer - Added point: [12.700, 12.900] to canopy: C-1
DEBUG CanopyDriver - Writing Canopy:C-0 center:[0.100, 0.950] numPoints:3 radius:[1:0.041]
DEBUG CanopyDriver - Writing Canopy:C-1 center:[12.500, 12.650] numPoints:2 radius:[0.200, 0.250]

 

I wrote a piece of code which shows me 2 clusters with 2 central points:

 

	
private final static String partMDir = outputDir + "\\" + "clusters-0-final" + "\\part-r-00000";

public void printClusters() { SequenceFile.Reader readerSequence; try { readerSequence = new SequenceFile.Reader(fs, new Path(partMDir), conf); Text clusterName = new Text(); ClusterWritable centerVector = new ClusterWritable(); while (readerSequence.next(clusterName, centerVector)) { System.out.println(centerVector.getValue() + " is a center of " + clusterName); } readerSequence.close(); } catch (IOException e) { e.printStackTrace(); } }

 The result:

 

C-0: {0:0.10000000000000002,1:0.9499999999999998} is a center of C-0
C-1: {0:12.5,1:12.65} is a center of C-1

 

I would like to list also all the elements from each cluster. I chcecked few methods methods from class Text but I did not find anything.

 

Thank you in advance

6 REPLIES 6

Re: How to list all vectors from a cluster

Master Collaborator

Text is a class from Hadoop and so would have nothing specific to clusters.

 

I am not aware of a job that would emit the cluster assignments. It would not be hard to create -- the cluster assignment is just the nearest cluster -- but I believe that would be something you create yourself.

Re: How to list all vectors from a cluster

New Contributor

Hi,

 

Thank you for your answer.

 

In fact I am trying to do it in this waw

 

from this feild ClusterWritable centerVector


I am taking Cluster in this way

 

Cluster current = centerVector.getValue();

 

I cannot find a method to list all the elements.

When I use this method:

 

System.out.println(current.asFormatString(null));


I am getting more info but not enough

C-1{n=2 c=[12.500, 12.650] r=[0.200, 0.250]}

 


Maybe, I should ask another question:

 

Is it possible to do it in normal, Java way, I mean with method f.e. cluster.getAllVectors() or something?

 

I do not think so that I need to implement some function which take all points which distnace from center is smaller than r...

 

Re: How to list all vectors from a cluster

Master Collaborator

Well the cluster is just represented by the centroid here.

All of the elements are all the data in your input. You would loop over those and assign them to the nearest cluster.

Re: How to list all vectors from a cluster

New Contributor

Thank you. I hoped there is a method which does it...

 

so, last thing.

 

I am iterating through input file.

 

I have got a point X,Y

 

and 2 clusters c1 and c2 with ranges r1 and r2 and centres o1 and o2.

 

Where I should add this point if

 

[(X,Y) is inside O(o1, r1)] AND [(X,Y) is inside O(o2, r2)]

 

?

 

And second question, why getRadius gives sometimes vector with length 1 and sometimes 2? How can I calculate distance from center ?

Re: How to list all vectors from a cluster

Master Collaborator

Ah, Suneel on the mailing list is right the ClusterDumper actually probably does just this for you. I'd follow that lead.

Re: How to list all vectors from a cluster

New Contributor

Hi,

 

Yes, that is what I am trying now...

 

But in fact, all of these things are quite complicated and tehere is not a lot of documentation for Java API with examples...

 

I will try to code it in this way.

 

Thank you very much for your effort