Community Articles

Find and share helpful community-sourced technical articles.

Explorer

In some application use cases, developers want to save Spark DataFrame table directly into Phoenix instead of saving into HBase as a intermediate step. In those case, we can use Apache Phoenix-Spark plugin package. The related api is very simple:

df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> "OUTPUT_TABLE",
  "zkUrl" -> "****:2181:/****"))

However, we need to pay attention that in Apache Phoenix, all the column names by default are considered as uppercase unless you surround it with quotation marks "". Therefore, if you have specified lowercase column name in your Phoenix Schema, you have to do some column names transformation in Spark. The example code is as follows:

val oldNames = df.columns
val newNames = oldNames.map(name => col(name).as("\"" + name + "\""))
val df2 = df.select(newNames:_*)

2,997 Views

Announcements

Community Announcements

April 2025 Cloudera Customer Advisory: Cloudera’s response t...

What's New @ Cloudera

[RELEASED] Cloudera Streaming Analytics - Kubernetes Operato...

What's New @ Cloudera

[RELEASED] Cloudera Streams Messaging - Kubernetes Operator ...

Community Announcements

February 2025 Community Highlights

What's New @ Cloudera

3 Benefits of External IDE Connectivity, Now Available in Cl...

Top Kudoed Authors

User

Count

766

379

316

309

270

Cloudera Community

Community Articles

Save Spark DataFrame table into Phoenix

Apache Phoenix

Apache Spark

Saving Spark 2.2 dataframs in Hive table

Spark RDDs vs DataFrames vs SparkSQL

Accessing Hbase tables and querying on Dataframes ...

How to save dataframe as text file

Spark 2 Can't write dataframe to parquet table

Get data from Oracle by Apache NiFi , then save to...

Spark Streaming Explained: Kafka to Phoenix

Failing to save dataframe to

How to connect to Phoenix tables using Spark2

TimestampType format for Spark DataFrames