Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to define datatype when creating dataframe using sql.types

Solved Go to solution
Highlighted

How to define datatype when creating dataframe using sql.types

Explorer

I am trying to convert a text file to DataFrame. I found using following method instead of case class.
But where is the data type for each field is defined if we go by this method.

 

val people = sc.textFile("file:/home/edureka/dmishra/people.txt")
val schemaString = "name age"
import org.apache.spark.sql.Row;
import org.apache.spark.sql.types.{StructType,StructField,StringType};
val schema =
          StructType(
          schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)))
val rowRDD = people.map(_.split(",")).map(p => Row(p(0), p(1).trim))

val peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema)
peopleDataFrame.registerTempTable("people")
val results = sqlContext.sql("select name,age from people")
val r = results.map(t => "Name: " + t(0) + "Age : " + t(1)).collect().foreach(println

scala> results.dtypes.foreach(println)
(name,StringType)
(age,StringType)

 

Where is the data type assigned for data frame. How to define age as integer data type in this case or if there is a date field, where to define it.

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to define datatype when creating dataframe using sql.types

Contributor

It is the below line which is setting the data types for both the fields as StringType: 

 

val schema =
          StructType(
          schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)))

You can define your custom schema as follows : 

 

val customSchema = StructType(Array(
    StructField("name", StringType, true),
    StructField("age", IntegerType, true)))

You can add additional fields as well in the above schema definition.

 

And then you can use this customSchema while creating the dataframe as follows: 

 

val peopleDataFrame = sqlContext.createDataFrame(rowRDD, customSchema)

 

Also for details, please see this page

View solution in original post

1 REPLY 1

Re: How to define datatype when creating dataframe using sql.types

Contributor

It is the below line which is setting the data types for both the fields as StringType: 

 

val schema =
          StructType(
          schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)))

You can define your custom schema as follows : 

 

val customSchema = StructType(Array(
    StructField("name", StringType, true),
    StructField("age", IntegerType, true)))

You can add additional fields as well in the above schema definition.

 

And then you can use this customSchema while creating the dataframe as follows: 

 

val peopleDataFrame = sqlContext.createDataFrame(rowRDD, customSchema)

 

Also for details, please see this page

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here