How to define datatype when creating dataframe using sql.types

dmishraoc — Fri, 16 Sep 2022 11:27:54 GMT

I am trying to convert a text file to DataFrame. I found using following method instead of case class.
But where is the data type for each field is defined if we go by this method.

val people = sc.textFile("file:/home/edureka/dmishra/people.txt")
val schemaString = "name age"
import org.apache.spark.sql.Row;
import org.apache.spark.sql.types.{StructType,StructField,StringType};
val schema =
          StructType(
          schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)))
val rowRDD = people.map(_.split(",")).map(p => Row(p(0), p(1).trim))

val peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema)
peopleDataFrame.registerTempTable("people")
val results = sqlContext.sql("select name,age from people")
val r = results.map(t => "Name: " + t(0) + "Age : " + t(1)).collect().foreach(println

scala> results.dtypes.foreach(println)
(name,StringType)
(age,StringType)

Where is the data type assigned for data frame. How to define age as integer data type in this case or if there is a date field, where to define it.

Thanks

Re: How to define datatype when creating dataframe using sql.types

_Umesh — Sat, 15 Apr 2017 03:39:09 GMT

It is the below line which is setting the data types for both the fields as StringType:

val schema =
          StructType(
          schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)))

You can define your custom schema as follows :

val customSchema = StructType(Array(
    StructField("name", StringType, true),
    StructField("age", IntegerType, true)))

You can add additional fields as well in the above schema definition.

And then you can use this customSchema while creating the dataframe as follows:

val peopleDataFrame = sqlContext.createDataFrame(rowRDD, customSchema)

Also for details, please see this page.

question How to define datatype when creating dataframe using sql.types in Archives of Support Questions (Read Only)

How to define datatype when creating dataframe using sql.types

Re: How to define datatype when creating dataframe using sql.types