Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Load MYSQL table in to RDD

avatar
Expert Contributor

How can I load a complete table to an RDD using Spark.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

There is a JDBC RDD function:

newJdbcRDD(sc: SparkContext, getConnection: () ⇒ Connection, sql: String, lowerBound: Long, upperBound: Long, numPartitions: Int, mapRow: (ResultSet) ⇒ T = JdbcRDD.resultSetToObjectArray)(implicit arg0: ClassTag[T])

View solution in original post

2 REPLIES 2

avatar

I'm not aware of direct connector to MySQL. You could use Sqoop to ingest the contents of your table into HDFS then use the SparkContext's textFile() method to load it as an RDD.

avatar
Super Collaborator

There is a JDBC RDD function:

newJdbcRDD(sc: SparkContext, getConnection: () ⇒ Connection, sql: String, lowerBound: Long, upperBound: Long, numPartitions: Int, mapRow: (ResultSet) ⇒ T = JdbcRDD.resultSetToObjectArray)(implicit arg0: ClassTag[T])