Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

rename columns of the dataframe

SOLVED Go to solution

rename columns of the dataframe

Master Collaborator

Hi I have a dataframe (loaded CSV) where the inferredSchema filled the column names from the file. I am trying to get rid of white spaces from column names - because otherwise the DF cannot be saved as parquet file - and did not find any usefull method for renaming.

 

The method withColumnRenamed("Company ID","Company_ID") works, but I need to repeat it for every column in the dataframe. I tried to to use toDF method,

such as:

 

val dfnew = df.toDF( df.columns.map( a => a.replace(" ","_") ) );

but it failed.,

 

Any ideas?

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: rename columns of the dataframe

Master Collaborator

I have found a solution to this:

 

df.registerTempTable("tmp");

val newdf = sqlContext.sql(""" select  'Company ID' as Company_ID, 'Product ID' as Product_ID, .. from tmp""");

newdf.saveAsParquetFile(...);

 

T.

 

2 REPLIES 2

Re: rename columns of the dataframe

Master Collaborator

I have found a solution to this:

 

df.registerTempTable("tmp");

val newdf = sqlContext.sql(""" select  'Company ID' as Company_ID, 'Product ID' as Product_ID, .. from tmp""");

newdf.saveAsParquetFile(...);

 

T.

 

Re: rename columns of the dataframe

Master Collaborator
update: the column with a whitespace in the name has to be enclosed in ``. So the correct syntax is:
"""select `Company ID` as Company_ID, .... """