Support Questions

Find answers, ask questions, and share your expertise

Pyspark Phoenix integration failing in oozie workflow

I am connecting and ingesting data into phoenix table using pyspark by below code

dataframe.write.format("org.apache.phoenix.spark").mode("overwrite").option("table", "tablename").option("zkUrl", "localhost:2181").save()

When i run this in spark submit it works fine by below command,

spark-submit --master local --deploy-mode client --files /etc/hbase/conf/hbase-site.xml --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-" --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-"

When i run this with oozie I am getting below error,

.ConnectionClosingException: Connection to is closing. Call id=9, waitTime=3 row 'SYSTEM:CATALOG,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ip-172-31-44-101

Below is workflow,

<action name="pysparkAction" retry-max="1" retry-interval="1" cred="hbase">
<name>Spark Example</name>
<spark-opts>--py-files --files /etc/hbase/conf/hbase-site.xml --conf spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark- --conf spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-</spark-opts>
<ok to="successEmailaction"/>
<error to="failEmailaction"/>

Using spark-submit I got the same error I corrected that by passing required jars. In oozie, Even i pass jars, it throwing error.


Do you have security enabled? Usually clients see this error but the server rejects the authenticated RPC.

Turn on DEBUG logging for HBase and look at the RegionServer log for the hostname that you have configured. Most of the time, this is a result of a impersonation-related configuration error. The DEBUG message in the RegionServer log will inform you what the "real" user is (who is providing kerberos credentials) and who they are trying to impersonate (who the real user "says" they are). In your case here, "oozie" would be saying that it is "you" (or however you are running this application as).

From this, you can amend your `hadoop.proxyuser...` configuration properties in core-site.xml, restart HBase, and try again.

Hi @Josh Elser Thank you so much for the answer. I checked what you said and everything fine. I am using below jdbc url as zkUrl when accessing phoenix. My cluster is kerberized cluster so I am passing all credentials properly as below


The problem is when i execute my pyspark with this jdbc url using spark-submit, it works fine. If i execute same code in oozie workflow its throwing below exception because of hbase connectivity issue

java.sql.SQLException: org.apache.hadoop.hbase.client.RetriesExhaustedException:Failed after attempts=36, exceptions:MonFeb1107:33:05 UTC 2019,null, callTimeout=60000, callDuration=68427: row 'SYSTEM:CATALOG,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740,,16020,1545291237502, seqNum=0

How same code works fine in spark-submit and not in oozie workflow. I copied in all dependency jars in workflow/lib folder in hdfs. How to debug this further.

I found that "--files /etc/hbase/conf/hbase-site.xml" does not working when integrated with oozie. I pass the hbase-site.xml as below with file tag in oozie spark action. It works fine now