Support Questions

Find answers, ask questions, and share your expertise

Unable to connect spark python with phoenix in secure

avatar
Expert Contributor
from pyspark import SparkContext
from pyspark.sql import SQLContext


sc = SparkContext()
sqlc = SQLContext(sc)


df = sqlc.read \
  .format("org.apache.phoenix.spark") \
  .option("table", "TABLE1") \
  .option("zkUrl", "<host>:2181:/hbase-secure") \
  .load()


print(df)

Run using:

spark-submit --master local --jars /usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.5.3.0-37.jar --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-client.jar" spark_phoenix.py

Error:

17/03/11 23:20:03 INFO ZooKeeper: Initiating client connection, connectString=<host>:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@653a27b2
17/03/11 23:20:03 INFO ClientCnxn: Opening socket connection to server <host>/10.18.106.4:2181. Will not attempt to authenticate using SASL (unknown error)
17/03/11 23:20:03 INFO ClientCnxn: Socket connection established to <host>/10.18.106.4:2181, initiating session
17/03/11 23:20:03 INFO ClientCnxn: Session establishment complete on server <host>/10.18.106.4:2181, sessionid = 0x15a7b6220540b7e, negotiated timeout = 40000
Traceback (most recent call last):
  File "/d3/app/bin/spark_phoenix.py", line 10, in <module>
    .option("zkUrl", "<host>:2181:/hbase-secure") \
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 139, in load
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o43.load.
: java.sql.SQLException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
Sat Mar 11 23:20:03 EST 2017, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68236: row 'SYSTEM:CATALOG,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=usor7dhc01w06.use.ucdp.net,16020,1488988574688, seqNum=0




        at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2590)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2327)
        at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:78)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2327)
        at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:233)
        at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:142)
        at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:202)
        at java.sql.DriverManager.getConnection(DriverManager.java:664)
        at java.sql.DriverManager.getConnection(DriverManager.java:208)
        at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:98)
        at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:57)
        at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:45)
        at org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil.getSelectColumnMetadataList(PhoenixConfigurationUtil.java:277)
        at org.apache.phoenix.spark.PhoenixRDD.toDataFrame(PhoenixRDD.scala:106)
        at org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:57)
        at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)



1 ACCEPTED SOLUTION

avatar
Super Collaborator

@nbalaji-elangovan

Copy/Symlink the hbase-site.xml under /etc/spark/conf as below:

 ln -s /etc/hbase/conf/hbase-site.xml /etc/spark/conf/hbase-site.xml

Once done execute the spark-submit as you had done earlier and then try again

View solution in original post

4 REPLIES 4

avatar
Master Mentor
@nbalaji-elangovan

As you mentioned that it is secured environment. So i guess In your "spark-submit" command line argument you should pass the keytab & principal information:

Example:

 --keytab /etc/security/keytabs/spark.headless.keytab --principal spark-XYZ@ABC.COM

.

avatar
Expert Contributor

That doesnt work, I tried it

avatar
Super Collaborator

@nbalaji-elangovan

Copy/Symlink the hbase-site.xml under /etc/spark/conf as below:

 ln -s /etc/hbase/conf/hbase-site.xml /etc/spark/conf/hbase-site.xml

Once done execute the spark-submit as you had done earlier and then try again

avatar
Expert Contributor

Worked. Thanks