<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question How to define datatype when creating dataframe using sql.types in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-define-datatype-when-creating-dataframe-using-sql/m-p/53709#M59511</link>
    <description>&lt;P&gt;&lt;SPAN&gt;I am trying to convert a text file to DataFrame. I found using following method instead of case class.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;But where is the data type for each field is defined if we go by this method.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;val people = sc.textFile("file:/home/edureka/dmishra/people.txt")
val schemaString = "name age"
import org.apache.spark.sql.Row;
import org.apache.spark.sql.types.{StructType,StructField,StringType};
val schema =
          StructType(
          schemaString.split(" ").map(fieldName =&amp;gt; StructField(fieldName, StringType, true)))
val rowRDD = people.map(_.split(",")).map(p =&amp;gt; Row(p(0), p(1).trim))

val peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema)
peopleDataFrame.registerTempTable("people")
val results = sqlContext.sql("select name,age from people")
val r = results.map(t =&amp;gt; "Name: " + t(0) + "Age : " + t(1)).collect().foreach(println&lt;/PRE&gt;&lt;P&gt;&lt;SPAN&gt;scala&amp;gt; results.dtypes.foreach(println)&lt;BR /&gt;(name,StringType)&lt;BR /&gt;(age,StringType)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Where is the data type assigned for data frame. How to define age as integer data type in this case or if there is a date field, where to define it.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 11:27:54 GMT</pubDate>
    <dc:creator>dmishraoc</dc:creator>
    <dc:date>2022-09-16T11:27:54Z</dc:date>
    <item>
      <title>How to define datatype when creating dataframe using sql.types</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-define-datatype-when-creating-dataframe-using-sql/m-p/53709#M59511</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I am trying to convert a text file to DataFrame. I found using following method instead of case class.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;But where is the data type for each field is defined if we go by this method.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;val people = sc.textFile("file:/home/edureka/dmishra/people.txt")
val schemaString = "name age"
import org.apache.spark.sql.Row;
import org.apache.spark.sql.types.{StructType,StructField,StringType};
val schema =
          StructType(
          schemaString.split(" ").map(fieldName =&amp;gt; StructField(fieldName, StringType, true)))
val rowRDD = people.map(_.split(",")).map(p =&amp;gt; Row(p(0), p(1).trim))

val peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema)
peopleDataFrame.registerTempTable("people")
val results = sqlContext.sql("select name,age from people")
val r = results.map(t =&amp;gt; "Name: " + t(0) + "Age : " + t(1)).collect().foreach(println&lt;/PRE&gt;&lt;P&gt;&lt;SPAN&gt;scala&amp;gt; results.dtypes.foreach(println)&lt;BR /&gt;(name,StringType)&lt;BR /&gt;(age,StringType)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Where is the data type assigned for data frame. How to define age as integer data type in this case or if there is a date field, where to define it.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 11:27:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-define-datatype-when-creating-dataframe-using-sql/m-p/53709#M59511</guid>
      <dc:creator>dmishraoc</dc:creator>
      <dc:date>2022-09-16T11:27:54Z</dc:date>
    </item>
    <item>
      <title>Re: How to define datatype when creating dataframe using sql.types</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-define-datatype-when-creating-dataframe-using-sql/m-p/53710#M59512</link>
      <description>&lt;P&gt;It is the below line which is setting the data types for both the fields as StringType:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;val schema =
          StructType(
          schemaString.split(" ").map(fieldName =&amp;gt; StructField(fieldName, StringType, true)))&lt;/PRE&gt;&lt;P&gt;You can define your custom schema as follows :&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;val customSchema = StructType(Array(
    StructField("name", StringType, true),
    StructField("age", IntegerType, true)))&lt;/PRE&gt;&lt;P&gt;You can add additional fields as well in the above schema definition.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And then you can use this customSchema while creating the dataframe as follows:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;val peopleDataFrame = sqlContext.createDataFrame(rowRDD, customSchema)&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also for details, please see &lt;A href="https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L299" target="_self"&gt;this page&lt;/A&gt;.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 15 Apr 2017 03:39:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-define-datatype-when-creating-dataframe-using-sql/m-p/53710#M59512</guid>
      <dc:creator>_Umesh</dc:creator>
      <dc:date>2017-04-15T03:39:09Z</dc:date>
    </item>
  </channel>
</rss>

