Support Questions

jestinm · ‎06-29-2016

After joining two dataframes, I find that the column order has changed what I supposed it would be.

Ex: Joining two data frames with columns [b,c,d,e] and [a,b] on b yields a column order of [b,a,c,d,e].

How can I change the order of the columns (e.g., [a,b,c,d,e])? I've found ways to do it in Python/R but not Scala or Java. Are there any methods that allow swapping or reordering of dataframe columns?

psingh15 · ‎07-02-2016

@Jestin: Why do you need sorting columns in dataframes? Could u please elaborate.

However in Java there is no inbuilt function to reorder the columns.

View solution in original post

jyadav · ‎06-29-2016

Your sorting should happens on the basis of the key, here is an example for scala.

val file = sc.textFile("some_local_text_file_pathname")
val wordCounts = file.flatMap(line => line.split(" "))
  .map(word => (word, 1))
  .reduceByKey(_ + _, 1)  // 2nd arg configures one task (same as number of partitions)
  .map(item => item.swap) // interchanges position of entries in each tuple
  .sortByKey(true, 1) // 1st arg configures ascending sort, 2nd arg configures one task
  .map(item => item.swap)

psingh15 · ‎07-02-2016

@Jestin: Why do you need sorting columns in dataframes? Could u please elaborate.

However in Java there is no inbuilt function to reorder the columns.

TimothySpann · ‎07-02-2016

why does the order of columns matter?

psingh15 · ‎07-02-2016

There are scenarios(though bad) where data insertion requires the ordering of columns to be in Lexicographical Sorting while inserting data into db using JDBC connection. Not sure if jestin ma is facing similar issue.

don_jernigan · ‎07-03-2016

In order to reorder tuples (columns) in scala I think you just use a map like in Pyspark:

val rdd2 = rdd.map((x, y, z) => (z, y, x))

You should also be able to build key-value pairs this way too.

val rdd2 = rdd.map((x, y, z) => (z, (y, x)))

This is very handy if you want to follow it up with sortByKey().

venki2404 · ‎07-15-2016

All you need to do is use select (worked for me). Do the following:

val new_df = df.select("a", "b", "c", "d", "e") // Assuming you want a, b, c, d, e to be your order

@venki2404

,

All you need to do do is use select (worked for me). Do the following:

val new_df = df.select("a", "b", "c", "d", "e") // assuming the column order you need is a, b, c, d, e

Cloudera Community

Support Questions

Spark Dataframes: How can I change the order of columns in Java/Scala?

Spark RDDs vs DataFrames vs SparkSQL

How to change column Type in SparkSQL?

Column names not getting created for Spark DataFra...

Can Dataframe joins in Spark preserve order?

How can we change the column order in Hive table w...

NiFi ETL: Removing columns, filtering rows, changi...

Accessing AWS services using AWS Java SDK in Scala...

Accessing Hbase tables and querying on Dataframes ...

rename columns of the dataframe

spark scala Dataframe adding new column Error