Community Articles
Find and share helpful community-sourced technical articles
Cloudera Employee

COD - CDE using Phoenix

In this article, we will walk through steps required to be followed to read/ write using Phoenix JDBC to COD (Cloudera Operational Database) from Spark on CDE (Cloudera Data Engineering).

Assumption

  • COD is already provisioned and database cod-db is created. Refer to this link for the same.
  • CDE is already provisioned and virtual cluster is already created. Refer to this link for the same.

COD

Configuration

Spark in CDE to be able to talk to COD, it would require the hbase-site.xml config of the COD cluster. Do the following steps for retrieve the same:

  1. Go to the COD control plane UI and click on cod-db database.
  2. Under the "Connect" tab of COD database, look for HBase Client Configuration URL field. Following is the screenshot for the same.
    Screenshot 2021-01-08 at 1.10.56 PM.png
    The configuration can be downloaded using the following curl command.

 

curl -o "hbase-config.zip" -u "santhosh" "https://XXXXXX.cldr.work/clouderamanager/api/v41/clusters/cod-hvtur2ovawfr/services/hbase/clientConfig"​

 

  • Make sure to provide the "workload" password for above curl call.
  • Explore the downloaded zip file to obtain the hbase-site.xml file.
  • Note down the Phoenix JDBC url from the Phoenix (Thick) tab.

Create table in COD

We would need to create table in COD using Phoenix for this demo. In order to do the same, please login into the gateway node of the COD cluster and run phoenix-sqlline command to create table as follows.

 

CREATE TABLE OUTPUT_TEST_TABLE (id BIGINT NOT NULL PRIMARY KEY, col1 VARCHAR, col2 INTEGER);

 

The node details can be obtained from the datahub hardware tab from control plane UI. Following is the screenshot for the same.

Screenshot 2021-01-08 at 5.39.10 PM.png

Build phoenix-spark project

Build the following Spark phoenix demo maven project.

https://github.com/bgsanthosh/spark-hbase/tree/master/spark-hbase

Make sure to bump the phoenix-spark and hbase-client version as per the Cloudera runtime version.

Run the following command from the project home.

 

mvn clean compile package

 

CDE

Configure and Run Job

  1. Configure CDE CLI to point to the virtual cluster created in the above step. For more details, see Configuring the CLI client.
  2. Create resource using the following command.
    cde resource create --name odx-spark-resource​
  3. Upload hbase-site.xml
    cde resource upload --name odx-spark-resource --local-path hbase-site.xml --resource-path conf/hbase-site.xml
  4. Upload the demo app jar that was built earlier.
    cde resource upload --name odx-spark-resource --local-path ./spark-hbase/target/spark-hbase-1.0-SNAPSHOT.jar --resource-path spark-hbase-1.0-SNAPSHOT.jar
  5. Create the CDE job using the following json and import command.

 

{
"mounts": [
{
"resourceName": "odx-spark-resource"
}
],
"name": "odx-read-write",
"spark": {
"className": "org.cloudera.e2e.spark.PhoenixReadWrite",
"conf": {
"spark.executor.extraClassPath": "/app/mount/conf",
"spark.driver.extraClassPath": "/app/mount/conf"
},
"args": [ "{{ phoenix_jdbc_url }}"],
"driverCores": 1,
"driverMemory": "1g",
"executorCores": 1,
"executorMemory": "1g",
"file": "spark-hbase-1.0-SNAPSHOT.jar",
"pyFiles": [],
"files": ["conf/hbase-site.xml"],
"numExecutors": 4
}
}​​
cde job import --file odx-readwrite-var.json

 

Next, run the job using the following command:

 

cde job run --name odx-read-write --variable phoenix_jdbc_url=<phoenix_jdbc_url>​

 

Check in phoenix-sqlline, the following data should be present.

 

0: jdbc:phoenix:> select * from OUTPUT_TEST_TABLE;
+-----+-------+-------+
| ID | COL1 | COL2 |
+-----+-------+-------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
+-----+-------+-------+

 

679 Views