Created 06-29-2016 07:31 PM
After joining two dataframes, I find that the column order has changed what I supposed it would be.
Ex: Joining two data frames with columns [b,c,d,e]
and [a,b]
on b
yields a column order of [b,a,c,d,e]
.
How can I change the order of the columns (e.g., [a,b,c,d,e]
)? I've found ways to do it in Python/R but not Scala or Java. Are there any methods that allow swapping or reordering of dataframe columns?
Created 07-02-2016 01:07 PM
@Jestin: Why do you need sorting columns in dataframes? Could u please elaborate.
However in Java there is no inbuilt function to reorder the columns.
Created 06-29-2016 07:38 PM
Your sorting should happens on the basis of the key, here is an example for scala.
val file = sc.textFile("some_local_text_file_pathname") val wordCounts = file.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _, 1) // 2nd arg configures one task (same as number of partitions) .map(item => item.swap) // interchanges position of entries in each tuple .sortByKey(true, 1) // 1st arg configures ascending sort, 2nd arg configures one task .map(item => item.swap)
Created 07-02-2016 01:07 PM
@Jestin: Why do you need sorting columns in dataframes? Could u please elaborate.
However in Java there is no inbuilt function to reorder the columns.
Created 07-02-2016 01:47 PM
why does the order of columns matter?
Created 07-02-2016 03:29 PM
There are scenarios(though bad) where data insertion requires the ordering of columns to be in Lexicographical Sorting while inserting data into db using JDBC connection. Not sure if jestin ma is facing similar issue.
Created 07-03-2016 03:18 AM
In order to reorder tuples (columns) in scala I think you just use a map like in Pyspark:
val rdd2 = rdd.map((x, y, z) => (z, y, x))
You should also be able to build key-value pairs this way too.
val rdd2 = rdd.map((x, y, z) => (z, (y, x)))
This is very handy if you want to follow it up with sortByKey().
Created 07-15-2016 04:39 PM
All you need to do is use select (worked for me). Do the following:
val new_df = df.select("a", "b", "c", "d", "e") // Assuming you want a, b, c, d, e to be your order
@venki2404
,All you need to do do is use select (worked for me). Do the following:
val new_df = df.select("a", "b", "c", "d", "e") // assuming the column order you need is a, b, c, d, e