Reply
Contributor
Posts: 38
Registered: ‎01-05-2015
Accepted Solution

Understanding the mahout SSVD output!

Dear Colleagues,

 

In order to run a SSVD in mahout the documents were represented in a tfidf matrix using seq2sparse

(the row-index are the doc-ids and the column-index are the dict-id (word-id)).

 

The input for SSVD is these tfidf-matrix.

The output of the SSVD job are the matrices U,S,V (transpose).

 

How i can interprete this output regarding the original tfidf-matrix? Should i multiplice the original one with U, S or V?

What is the conclusion?

 

Thanks in advance and best regards,

 butkiz

 

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Understanding the mahout SSVD output!

The output is as you say -- these are the products of the SVD. You can
do what you want with them, and it depends on what you're trying to
achieve. You can look at the matrix V S to study term similarities, or
U S to discover document similarities for example.

Contributor
Posts: 38
Registered: ‎01-05-2015

Re: Understanding the mahout SSVD output!

Thanks! I try to figure out which terms are related to one topic. Should i multiplice at first the V and S matrices and then compute the distance of the "new" vectors? Whats your understanding?

Thanks and regards,
butkiz
Highlighted
Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Understanding the mahout SSVD output!

I suppose you can cluster term vectors in V S for this purpose, to
discover related terms and thus topics.
This is the type of problem where you might more usually use LDA.

I know you're using Mahout, but if you ever consider using Spark,
there's a chapter on exactly this in our book:
http://shop.oreilly.com/product/0636920035091.do

Announcements