Created on 04-14-2017 06:33 PM - edited 09-16-2022 04:27 AM
I am trying to convert a text file to DataFrame. I found using following method instead of case class.
But where is the data type for each field is defined if we go by this method.
val people = sc.textFile("file:/home/edureka/dmishra/people.txt")
val schemaString = "name age"
import org.apache.spark.sql.Row;
import org.apache.spark.sql.types.{StructType,StructField,StringType};
val schema =
StructType(
schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)))
val rowRDD = people.map(_.split(",")).map(p => Row(p(0), p(1).trim))
val peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema)
peopleDataFrame.registerTempTable("people")
val results = sqlContext.sql("select name,age from people")
val r = results.map(t => "Name: " + t(0) + "Age : " + t(1)).collect().foreach(printlnscala> results.dtypes.foreach(println)
(name,StringType)
(age,StringType)
Where is the data type assigned for data frame. How to define age as integer data type in this case or if there is a date field, where to define it.
Thanks
Created on 04-14-2017 08:31 PM - edited 04-14-2017 08:39 PM
It is the below line which is setting the data types for both the fields as StringType:
val schema =
StructType(
schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)))You can define your custom schema as follows :
val customSchema = StructType(Array(
StructField("name", StringType, true),
StructField("age", IntegerType, true)))You can add additional fields as well in the above schema definition.
And then you can use this customSchema while creating the dataframe as follows:
val peopleDataFrame = sqlContext.createDataFrame(rowRDD, customSchema)
Also for details, please see this page.
Created on 04-14-2017 08:31 PM - edited 04-14-2017 08:39 PM
It is the below line which is setting the data types for both the fields as StringType:
val schema =
StructType(
schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)))You can define your custom schema as follows :
val customSchema = StructType(Array(
StructField("name", StringType, true),
StructField("age", IntegerType, true)))You can add additional fields as well in the above schema definition.
And then you can use this customSchema while creating the dataframe as follows:
val peopleDataFrame = sqlContext.createDataFrame(rowRDD, customSchema)
Also for details, please see this page.