Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Converting JSON to Rdd

avatar
New Member

I am getting a json response, and in my sparkSQL data source, i need to read the data and infer schema for the json and convert in to rdd<ROW>. Is there any class to do that in spark?

Thanks

1 ACCEPTED SOLUTION

avatar
Super Collaborator
val dataframe = sqlContext.read.json(<a RDD[String] where each line is JSON object>)

See https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/DataFrameReader.html#json(org.apac...

View solution in original post

10 REPLIES 10

avatar
New Member

I dont want to read from files. I have json data in a variable coming from http response in my code.

avatar
Super Guru

@Akash Mehta

So, even following wont work for you? If not, I think currently there is no other way given we have looked at all other possible options.

//a DataFrame can be created for a JSON dataset represented by
// an RDD[String] storing one JSON object per string.
val anotherPeopleRDD = sc.parallelize(
  """{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil)
val anotherPeople = sqlContext.read.json(anotherPeopleRDD)

avatar
Super Guru

@Akash Mehta Can you do something like this?

dataframe = sqlContext.read.format(“json”).load(your json here)

avatar
New Member

But "your json here" takes a path and i am having the json from an httpresponse (converted to string).

I need to read from that and infer the schema and convert to rdd<ROW>

avatar
Super Guru

load will infer schema and convert to a row. Question is whether it will take an http url. Can you try?

avatar
New Member

Yes yes load will do that but load requires an input path and i have my json stored in a string variable.

avatar
Super Collaborator
val dataframe = sqlContext.read.json(<a RDD[String] where each line is JSON object>)

See https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/DataFrameReader.html#json(org.apac...

avatar
New Member

This will output a dataframe and i need RDD[Row]