Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (2)

The key thing to remember is that in Spark RDD/DF are immutable. So once created you can not change them.

However there are many situation where you want the column type to be different.

E.g By default Spark comes with cars.csv where year column is a String. If you want to use a datetime function you need the column as a Datetime. You can change the column type from string to date in a new dataframe.

Here is an example to change the column type.

val df2 = sqlContext.load("com.databricks.spark.csv", Map("path" -> "file:///Users/vshukla/projects/spark/sql/core/src/test/resources/cars.csv", "header" -> "true"))
df2.printSchema()
val df4 = df2.withColumn("year2", 'year.cast("Int")).select('year2 as 'year, 'make, 'model, 'comment)
df4.printSchema
47,604 Views
0 Kudos
Comments
Contributor

Hi,

Its not working for me ..Unable to cast the column data type.Here is my code.I am getting only null values.Thanks!

val df4 = businessDF.withColumn("Estimated Gross Loss1", $"Estimated Gross Loss".cast("Int")).select($"Estimated Gross Loss1" as "Estimated Gross Loss")


Not applicable

Nice article. I am looking for a solution, Instead of specifying a each column separately. is there any way we can dynamically handle the datatype. Lets say in my dataframe i have 50 columns out of 8 are decimals and need to convert all decimal datatype to double. Without specify a column name can we do that directly?

New Contributor

I am also looking for something like this. I need to convert all my date datatypes to varchar in a dataframe having more than 300 columns. 

 

Any suggestions?? 

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎05-21-2016 08:35 AM
Updated by:
 
Contributors
Top Kudoed Authors