<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Link Analysis using Spark Python in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Link-Analysis-using-Spark-Python/m-p/113465#M38493</link>
    <description>&lt;P&gt;Hi Pedro,&lt;/P&gt;&lt;P&gt;python API for Spark is still missing, however there is a git project with a higher level API on top of Spark GraphX called GraphFrames: &lt;A href="http://graphframes.github.io/index.html"&gt;(GraphFrames)&lt;/A&gt; . The project claims: "GraphX is to RDDs as GraphFrames are to DataFrames."&lt;/P&gt;&lt;P&gt;I haven't worked with it, however a quick test of their samples with Spark 1.6.2 worked:&lt;/P&gt;&lt;P&gt;Use pyspark like this:&lt;/P&gt;&lt;PRE&gt;pyspark --packages graphframes:graphframes:0.2.0-spark1.6-s_2.10&lt;/PRE&gt;&lt;P&gt;or use zeppelin and add the dependencies to the interpreter configuration.&lt;/P&gt;&lt;P&gt;Maybe this library has what you need.&lt;/P&gt;</description>
    <pubDate>Mon, 22 Aug 2016 18:55:39 GMT</pubDate>
    <dc:creator>bwalter1</dc:creator>
    <dc:date>2016-08-22T18:55:39Z</dc:date>
    <item>
      <title>Link Analysis using Spark Python</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Link-Analysis-using-Spark-Python/m-p/113464#M38492</link>
      <description>&lt;P&gt;Hi,

I need to create some graphs using PySpark to elaborate some link analysis research. I already see this link:

&lt;A href="http://kukuruku.co/hub/algorithms/social-network-analysis-spark-graphx" target="_blank"&gt;http://kukuruku.co/hub/algorithms/social-network-analysis-spark-graphx&lt;/A&gt;

But this algorithm is implemented in Scala which is very more complex to understand. 

Anyone have an idea on a white paper or some tutorial that do some link analysis research using PySpark?

Thanks!&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:35:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Link-Analysis-using-Spark-Python/m-p/113464#M38492</guid>
      <dc:creator>Stewart12586</dc:creator>
      <dc:date>2022-09-16T10:35:54Z</dc:date>
    </item>
    <item>
      <title>Re: Link Analysis using Spark Python</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Link-Analysis-using-Spark-Python/m-p/113465#M38493</link>
      <description>&lt;P&gt;Hi Pedro,&lt;/P&gt;&lt;P&gt;python API for Spark is still missing, however there is a git project with a higher level API on top of Spark GraphX called GraphFrames: &lt;A href="http://graphframes.github.io/index.html"&gt;(GraphFrames)&lt;/A&gt; . The project claims: "GraphX is to RDDs as GraphFrames are to DataFrames."&lt;/P&gt;&lt;P&gt;I haven't worked with it, however a quick test of their samples with Spark 1.6.2 worked:&lt;/P&gt;&lt;P&gt;Use pyspark like this:&lt;/P&gt;&lt;PRE&gt;pyspark --packages graphframes:graphframes:0.2.0-spark1.6-s_2.10&lt;/PRE&gt;&lt;P&gt;or use zeppelin and add the dependencies to the interpreter configuration.&lt;/P&gt;&lt;P&gt;Maybe this library has what you need.&lt;/P&gt;</description>
      <pubDate>Mon, 22 Aug 2016 18:55:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Link-Analysis-using-Spark-Python/m-p/113465#M38493</guid>
      <dc:creator>bwalter1</dc:creator>
      <dc:date>2016-08-22T18:55:39Z</dc:date>
    </item>
  </channel>
</rss>

