Support Questions

Find answers, ask questions, and share your expertise
Announcements
Now Live: Explore expert insights and technical deep dives on the new Cloudera Community BlogsRead the Announcement

Load MYSQL table in to RDD

avatar
Expert Contributor

How can I load a complete table to an RDD using Spark.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

There is a JDBC RDD function:

newJdbcRDD(sc: SparkContext, getConnection: () ⇒ Connection, sql: String, lowerBound: Long, upperBound: Long, numPartitions: Int, mapRow: (ResultSet) ⇒ T = JdbcRDD.resultSetToObjectArray)(implicit arg0: ClassTag[T])

View solution in original post

2 REPLIES 2

avatar

I'm not aware of direct connector to MySQL. You could use Sqoop to ingest the contents of your table into HDFS then use the SparkContext's textFile() method to load it as an RDD.

avatar
Super Collaborator

There is a JDBC RDD function:

newJdbcRDD(sc: SparkContext, getConnection: () ⇒ Connection, sql: String, lowerBound: Long, upperBound: Long, numPartitions: Int, mapRow: (ResultSet) ⇒ T = JdbcRDD.resultSetToObjectArray)(implicit arg0: ClassTag[T])