- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Spark Dataframes: How can I change the order of columns in Java/Scala?
- Labels:
-
Apache Spark
Created 06-29-2016 07:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After joining two dataframes, I find that the column order has changed what I supposed it would be.
Ex: Joining two data frames with columns [b,c,d,e]
and [a,b]
on b
yields a column order of [b,a,c,d,e]
.
How can I change the order of the columns (e.g., [a,b,c,d,e]
)? I've found ways to do it in Python/R but not Scala or Java. Are there any methods that allow swapping or reordering of dataframe columns?
Created 07-02-2016 01:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Jestin: Why do you need sorting columns in dataframes? Could u please elaborate.
However in Java there is no inbuilt function to reorder the columns.
Created 06-29-2016 07:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your sorting should happens on the basis of the key, here is an example for scala.
val file = sc.textFile("some_local_text_file_pathname") val wordCounts = file.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _, 1) // 2nd arg configures one task (same as number of partitions) .map(item => item.swap) // interchanges position of entries in each tuple .sortByKey(true, 1) // 1st arg configures ascending sort, 2nd arg configures one task .map(item => item.swap)
Created 07-02-2016 01:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Jestin: Why do you need sorting columns in dataframes? Could u please elaborate.
However in Java there is no inbuilt function to reorder the columns.
Created 07-02-2016 01:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
why does the order of columns matter?
Created 07-02-2016 03:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are scenarios(though bad) where data insertion requires the ordering of columns to be in Lexicographical Sorting while inserting data into db using JDBC connection. Not sure if jestin ma is facing similar issue.
Created 07-03-2016 03:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In order to reorder tuples (columns) in scala I think you just use a map like in Pyspark:
val rdd2 = rdd.map((x, y, z) => (z, y, x))
You should also be able to build key-value pairs this way too.
val rdd2 = rdd.map((x, y, z) => (z, (y, x)))
This is very handy if you want to follow it up with sortByKey().
Created 07-15-2016 04:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
All you need to do is use select (worked for me). Do the following:
val new_df = df.select("a", "b", "c", "d", "e") // Assuming you want a, b, c, d, e to be your order
@venki2404
,All you need to do do is use select (worked for me). Do the following:
val new_df = df.select("a", "b", "c", "d", "e") // assuming the column order you need is a, b, c, d, e
