<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Description of the PCA in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Description-of-the-PCA/m-p/106673#M69551</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I executed the following code to obtain a description of the PCA&lt;/P&gt;&lt;PRE&gt;import org.apache.spark.mllib.feature.PCA
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.rdd.RDDval 

unparseddata = sc.textFile("hdfs:///tmp/epidemiological10.csv")
val data = unparseddata.map { line =&amp;gt;
  val parts = line.split(',').map(_.toDouble)
  LabeledPoint(parts.last, Vectors.dense(parts.slice(0, parts.length)))
}

val pca = new PCA(5).fit(data.map(_.features))
val projected = data.map(p =&amp;gt; p.copy(features = pca.transform(p.features)))

val collect = projected.collect()
println("Projected vector of principal component:")
collect.foreach { vector =&amp;gt; println(vector)}&lt;/PRE&gt;&lt;P&gt;and I obtained the following result:&lt;/P&gt;&lt;P&gt;(&lt;STRONG&gt;160.0&lt;/STRONG&gt;,[-226.2602388674248,-28.5763504459316,-167.30588000588938,-169.403316284169,23.09294762015914]) &lt;/P&gt;&lt;P&gt;(&lt;STRONG&gt;176.0&lt;/STRONG&gt;,[-248.89483793051159,-21.97201619037966,-193.69749510702238,-108.81814406079761,20.90854574732602]) &lt;/P&gt;&lt;P&gt;(&lt;STRONG&gt;179.0&lt;/STRONG&gt;,[-253.1354367540671,-29.972928370070743,-244.2610705303066,-129.17921788251297,20.090356540571392]) &lt;/P&gt;&lt;P&gt;(&lt;STRONG&gt;172.7&lt;/STRONG&gt;,[-244.22812858428057,-21.1460977635957,-179.6413565398707,-106.6403738598213,23.450082340280513])&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;I assume that in brackets there are the five first components of the PCA but I could like to know what do the numbers I put in bold mean.&lt;/P&gt;&lt;P&gt;Thanks in advance,&lt;/P&gt;&lt;P&gt;Laia&lt;/P&gt;</description>
    <pubDate>Wed, 17 Aug 2016 14:06:20 GMT</pubDate>
    <dc:creator>laia_subirats</dc:creator>
    <dc:date>2016-08-17T14:06:20Z</dc:date>
    <item>
      <title>Description of the PCA</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Description-of-the-PCA/m-p/106673#M69551</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I executed the following code to obtain a description of the PCA&lt;/P&gt;&lt;PRE&gt;import org.apache.spark.mllib.feature.PCA
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.rdd.RDDval 

unparseddata = sc.textFile("hdfs:///tmp/epidemiological10.csv")
val data = unparseddata.map { line =&amp;gt;
  val parts = line.split(',').map(_.toDouble)
  LabeledPoint(parts.last, Vectors.dense(parts.slice(0, parts.length)))
}

val pca = new PCA(5).fit(data.map(_.features))
val projected = data.map(p =&amp;gt; p.copy(features = pca.transform(p.features)))

val collect = projected.collect()
println("Projected vector of principal component:")
collect.foreach { vector =&amp;gt; println(vector)}&lt;/PRE&gt;&lt;P&gt;and I obtained the following result:&lt;/P&gt;&lt;P&gt;(&lt;STRONG&gt;160.0&lt;/STRONG&gt;,[-226.2602388674248,-28.5763504459316,-167.30588000588938,-169.403316284169,23.09294762015914]) &lt;/P&gt;&lt;P&gt;(&lt;STRONG&gt;176.0&lt;/STRONG&gt;,[-248.89483793051159,-21.97201619037966,-193.69749510702238,-108.81814406079761,20.90854574732602]) &lt;/P&gt;&lt;P&gt;(&lt;STRONG&gt;179.0&lt;/STRONG&gt;,[-253.1354367540671,-29.972928370070743,-244.2610705303066,-129.17921788251297,20.090356540571392]) &lt;/P&gt;&lt;P&gt;(&lt;STRONG&gt;172.7&lt;/STRONG&gt;,[-244.22812858428057,-21.1460977635957,-179.6413565398707,-106.6403738598213,23.450082340280513])&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;I assume that in brackets there are the five first components of the PCA but I could like to know what do the numbers I put in bold mean.&lt;/P&gt;&lt;P&gt;Thanks in advance,&lt;/P&gt;&lt;P&gt;Laia&lt;/P&gt;</description>
      <pubDate>Wed, 17 Aug 2016 14:06:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Description-of-the-PCA/m-p/106673#M69551</guid>
      <dc:creator>laia_subirats</dc:creator>
      <dc:date>2016-08-17T14:06:20Z</dc:date>
    </item>
    <item>
      <title>Re: Description of the PCA</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Description-of-the-PCA/m-p/106674#M69552</link>
      <description>&lt;P&gt;The bold number is the label from the LabelledPoint. Your map to create projected creates a copy of LabelledPoint replacing the features member with the principal components, but leaving the label untouched. Hence, you are getting the output tuple of (label, features) where features are your PCA result, and label is the original label.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Aug 2016 16:44:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Description-of-the-PCA/m-p/106674#M69552</guid>
      <dc:creator>sball</dc:creator>
      <dc:date>2016-08-18T16:44:20Z</dc:date>
    </item>
    <item>
      <title>Re: Description of the PCA</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Description-of-the-PCA/m-p/106675#M69553</link>
      <description>&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Aug 2016 16:56:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Description-of-the-PCA/m-p/106675#M69553</guid>
      <dc:creator>laia_subirats</dc:creator>
      <dc:date>2016-08-18T16:56:16Z</dc:date>
    </item>
  </channel>
</rss>

