Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Link Analysis using Spark Python

avatar
Rising Star

Hi, I need to create some graphs using PySpark to elaborate some link analysis research. I already see this link: http://kukuruku.co/hub/algorithms/social-network-analysis-spark-graphx But this algorithm is implemented in Scala which is very more complex to understand. Anyone have an idea on a white paper or some tutorial that do some link analysis research using PySpark? Thanks!

1 ACCEPTED SOLUTION

avatar

Hi Pedro,

python API for Spark is still missing, however there is a git project with a higher level API on top of Spark GraphX called GraphFrames: (GraphFrames) . The project claims: "GraphX is to RDDs as GraphFrames are to DataFrames."

I haven't worked with it, however a quick test of their samples with Spark 1.6.2 worked:

Use pyspark like this:

pyspark --packages graphframes:graphframes:0.2.0-spark1.6-s_2.10

or use zeppelin and add the dependencies to the interpreter configuration.

Maybe this library has what you need.

View solution in original post

1 REPLY 1

avatar

Hi Pedro,

python API for Spark is still missing, however there is a git project with a higher level API on top of Spark GraphX called GraphFrames: (GraphFrames) . The project claims: "GraphX is to RDDs as GraphFrames are to DataFrames."

I haven't worked with it, however a quick test of their samples with Spark 1.6.2 worked:

Use pyspark like this:

pyspark --packages graphframes:graphframes:0.2.0-spark1.6-s_2.10

or use zeppelin and add the dependencies to the interpreter configuration.

Maybe this library has what you need.