Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

DenseRank() in Spark DataFrame usage?

Highlighted

DenseRank() in Spark DataFrame usage?

New Contributor

Hi,

I have a requirement for Denserank for ordering the elements based on date.

JavaRDD<PageData> rddX = connector.getSparkContext().parallelize(pageData,20);

DataFrame processedData= context.createDataFrame(rddX, PageData.class);

DataFrame processedData = processedData.select(processedData.col("date_publication"), processedData.col("date_application"),processedData.col("id_ref"),processedData.col("id_unite"),processedData.col("libelle"),processedData.col("valeur"), processedData.col("zone"),processedData.col("tableau"), org.apache.spark.sql.functions.denseRank().over(org.apache.spark.sql.expressions.Window.partitionBy(processedData.col("tableau")).orderBy(processedData.col("libelle"))).alias("ordre"));

The above code is giving error. Can any one please help me on this?

1 REPLY 1

Re: DenseRank() in Spark DataFrame usage?

New Contributor

Hi @Rambabu Chamakuri

It might be easier to express it an sql statement :

// SC is an existing JavaSparkContext
SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc)

JavaRDD<PageData> rddX = sc.parallelize(pageData,20);
// Apply the schema to the RDD to create a dataFrame 
DataFrame processedData= sqlContext.createDataFrame(rddX, PageData.class);

// Register the DataFrame as a table.
processedData.registerTempTable("data");

//Use SQL to express your Queries
DataFrame result = sqlContext.sql("SELECT date_publication, date_application, id_ref, id_unite, libelle, valeur, zone, tableau, dense_rank() OVER (PARTITION BY tableau ORDER BY libelle DESC) as ordre FROM data");

You may have already read them, but here are a few good ressources to help you out :

Databrick's "Introducing Window Functions in Spark SQL" blog article : https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html

Apache SPARK programming guide http://spark.apache.org/docs/1.6.2/sql-programming-guide.html