Support Questions

Find answers, ask questions, and share your expertise

Spark Dataframes: How can I change the order of columns in Java/Scala?

Explorer

After joining two dataframes, I find that the column order has changed what I supposed it would be.

Ex: Joining two data frames with columns [b,c,d,e] and [a,b] on b yields a column order of [b,a,c,d,e].

How can I change the order of the columns (e.g., [a,b,c,d,e])? I've found ways to do it in Python/R but not Scala or Java. Are there any methods that allow swapping or reordering of dataframe columns?

1 ACCEPTED SOLUTION

Explorer

@Jestin: Why do you need sorting columns in dataframes? Could u please elaborate.

However in Java there is no inbuilt function to reorder the columns.

View solution in original post

6 REPLIES 6

Your sorting should happens on the basis of the key, here is an example for scala.

val file = sc.textFile("some_local_text_file_pathname")
val wordCounts = file.flatMap(line => line.split(" "))
  .map(word => (word, 1))
  .reduceByKey(_ + _, 1)  // 2nd arg configures one task (same as number of partitions)
  .map(item => item.swap) // interchanges position of entries in each tuple
  .sortByKey(true, 1) // 1st arg configures ascending sort, 2nd arg configures one task
  .map(item => item.swap)



Explorer

@Jestin: Why do you need sorting columns in dataframes? Could u please elaborate.

However in Java there is no inbuilt function to reorder the columns.

Super Guru

why does the order of columns matter?

Explorer

There are scenarios(though bad) where data insertion requires the ordering of columns to be in Lexicographical Sorting while inserting data into db using JDBC connection. Not sure if jestin ma is facing similar issue.

In order to reorder tuples (columns) in scala I think you just use a map like in Pyspark:

val rdd2 = rdd.map((x, y, z) => (z, y, x)) 

You should also be able to build key-value pairs this way too.

val rdd2 = rdd.map((x, y, z) => (z, (y, x)))

This is very handy if you want to follow it up with sortByKey().

New Contributor

All you need to do is use select (worked for me). Do the following:

val new_df = df.select("a", "b", "c", "d", "e") // Assuming you want a, b, c, d, e to be your order

@venki2404

,

All you need to do do is use select (worked for me). Do the following:

val new_df = df.select("a", "b", "c", "d", "e") // assuming the column order you need is a, b, c, d, e

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.