Created 02-08-2019 02:58 PM
I am connecting and ingesting data into phoenix table using pyspark by below code
dataframe.write.format("org.apache.phoenix.spark").mode("overwrite").option("table", "tablename").option("zkUrl", "localhost:2181").save()
When i run this in spark submit it works fine by below command,
spark-submit --master local --deploy-mode client --files /etc/hbase/conf/hbase-site.xml --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.6.3.0-235.jar:/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-client.jar" --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.6.3.0-235.jar:/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-client.jar" sparkPhoenix.py
When i run this with oozie I am getting below error,
.ConnectionClosingException: Connection to ip-172-31-44-101.us-west-2.compute.internal/172.31.44.101:16020 is closing. Call id=9, waitTime=3 row 'SYSTEM:CATALOG,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ip-172-31-44-101
Below is workflow,
<action name="pysparkAction" retry-max="1" retry-interval="1" cred="hbase"> <spark xmlns="uri:oozie:spark-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <master>local</master> <mode>client</mode> <name>Spark Example</name> <jar>sparkPhoenix.py</jar> <spark-opts>--py-files Leia.zip --files /etc/hbase/conf/hbase-site.xml --conf spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.6.3.0-235.jar:/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-client.jar --conf spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.6.3.0-235.jar:/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-client.jar</spark-opts> </spark> <ok to="successEmailaction"/> <error to="failEmailaction"/> </action>
Using spark-submit I got the same error I corrected that by passing required jars. In oozie, Even i pass jars, it throwing error.
Created 02-08-2019 03:25 PM
Do you have security enabled? Usually clients see this error but the server rejects the authenticated RPC.
Turn on DEBUG logging for HBase and look at the RegionServer log for the hostname that you have configured. Most of the time, this is a result of a impersonation-related configuration error. The DEBUG message in the RegionServer log will inform you what the "real" user is (who is providing kerberos credentials) and who they are trying to impersonate (who the real user "says" they are). In your case here, "oozie" would be saying that it is "you" (or however you are running this application as).
From this, you can amend your `hadoop.proxyuser...` configuration properties in core-site.xml, restart HBase, and try again.
Created 02-11-2019 10:11 AM
Hi @Josh Elser Thank you so much for the answer. I checked what you said and everything fine. I am using below jdbc url as zkUrl when accessing phoenix. My cluster is kerberized cluster so I am passing all credentials properly as below
jdbc:phoenix:ip-node1,ip-node2,ip-node3:2181:/hbase-secure:hbaseuser@HCL.COM:/home/hbaseuser/hbaseuser.keytab
The problem is when i execute my pyspark with this jdbc url using spark-submit, it works fine. If i execute same code in oozie workflow its throwing below exception because of hbase connectivity issue
java.sql.SQLException: org.apache.hadoop.hbase.client.RetriesExhaustedException:Failed after attempts=36, exceptions:MonFeb1107:33:05 UTC 2019,null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68427: row 'SYSTEM:CATALOG,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ip-172-31-44-101.us-west-2.compute.internal,16020,1545291237502, seqNum=0
How same code works fine in spark-submit and not in oozie workflow. I copied in all dependency jars in workflow/lib folder in hdfs. How to debug this further.
Created 02-13-2019 02:07 PM
I found that "--files /etc/hbase/conf/hbase-site.xml" does not working when integrated with oozie. I pass the hbase-site.xml as below with file tag in oozie spark action. It works fine now
<file>file:///etc/hbase/conf/hbase-site.xml</file>