Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)

The key thing to remember is that in Spark RDD/DF are immutable. So once created you can not change them.

However there are many situation where you want the column type to be different.

E.g By default Spark comes with cars.csv where year column is a String. If you want to use a datetime function you need the column as a Datetime. You can change the column type from string to date in a new dataframe.

Here is an example to change the column type.

val df2 = sqlContext.load("com.databricks.spark.csv", Map("path" -> "file:///Users/vshukla/projects/spark/sql/core/src/test/resources/cars.csv", "header" -> "true"))
df2.printSchema()
val df4 = df2.withColumn("year2", 'year.cast("Int")).select('year2 as 'year, 'make, 'model, 'comment)
df4.printSchema
56,217 Views
0 Kudos
Comments
Contributor

Hi,

Its not working for me ..Unable to cast the column data type.Here is my code.I am getting only null values.Thanks!

val df4 = businessDF.withColumn("Estimated Gross Loss1", $"Estimated Gross Loss".cast("Int")).select($"Estimated Gross Loss1" as "Estimated Gross Loss")


Not applicable

Nice article. I am looking for a solution, Instead of specifying a each column separately. is there any way we can dynamically handle the datatype. Lets say in my dataframe i have 50 columns out of 8 are decimals and need to convert all decimal datatype to double. Without specify a column name can we do that directly?

New Contributor

I am also looking for something like this. I need to convert all my date datatypes to varchar in a dataframe having more than 300 columns. 

 

Any suggestions?? 

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎05-21-2016 08:35 AM
Updated by:
Contributors
Top Kudoed Authors