Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to define datatype when creating dataframe using sql.types

avatar
Explorer

I am trying to convert a text file to DataFrame. I found using following method instead of case class.
But where is the data type for each field is defined if we go by this method.

 

val people = sc.textFile("file:/home/edureka/dmishra/people.txt")
val schemaString = "name age"
import org.apache.spark.sql.Row;
import org.apache.spark.sql.types.{StructType,StructField,StringType};
val schema =
          StructType(
          schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)))
val rowRDD = people.map(_.split(",")).map(p => Row(p(0), p(1).trim))

val peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema)
peopleDataFrame.registerTempTable("people")
val results = sqlContext.sql("select name,age from people")
val r = results.map(t => "Name: " + t(0) + "Age : " + t(1)).collect().foreach(println

scala> results.dtypes.foreach(println)
(name,StringType)
(age,StringType)

 

Where is the data type assigned for data frame. How to define age as integer data type in this case or if there is a date field, where to define it.

Thanks

1 ACCEPTED SOLUTION

avatar
Rising Star

It is the below line which is setting the data types for both the fields as StringType: 

 

val schema =
          StructType(
          schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)))

You can define your custom schema as follows : 

 

val customSchema = StructType(Array(
    StructField("name", StringType, true),
    StructField("age", IntegerType, true)))

You can add additional fields as well in the above schema definition.

 

And then you can use this customSchema while creating the dataframe as follows: 

 

val peopleDataFrame = sqlContext.createDataFrame(rowRDD, customSchema)

 

Also for details, please see this page

View solution in original post

1 REPLY 1

avatar
Rising Star

It is the below line which is setting the data types for both the fields as StringType: 

 

val schema =
          StructType(
          schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)))

You can define your custom schema as follows : 

 

val customSchema = StructType(Array(
    StructField("name", StringType, true),
    StructField("age", IntegerType, true)))

You can add additional fields as well in the above schema definition.

 

And then you can use this customSchema while creating the dataframe as follows: 

 

val peopleDataFrame = sqlContext.createDataFrame(rowRDD, customSchema)

 

Also for details, please see this page