Created 03-12-2017 04:51 AM
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext()
sqlc = SQLContext(sc)
df = sqlc.read \
.format("org.apache.phoenix.spark") \
.option("table", "TABLE1") \
.option("zkUrl", "<host>:2181:/hbase-secure") \
.load()
print(df)
Run using:
spark-submit --master local --jars /usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.5.3.0-37.jar --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-client.jar" spark_phoenix.py
Error:
17/03/11 23:20:03 INFO ZooKeeper: Initiating client connection, connectString=<host>:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@653a27b2
17/03/11 23:20:03 INFO ClientCnxn: Opening socket connection to server <host>/10.18.106.4:2181. Will not attempt to authenticate using SASL (unknown error)
17/03/11 23:20:03 INFO ClientCnxn: Socket connection established to <host>/10.18.106.4:2181, initiating session
17/03/11 23:20:03 INFO ClientCnxn: Session establishment complete on server <host>/10.18.106.4:2181, sessionid = 0x15a7b6220540b7e, negotiated timeout = 40000
Traceback (most recent call last):
File "/d3/app/bin/spark_phoenix.py", line 10, in <module>
.option("zkUrl", "<host>:2181:/hbase-secure") \
File "/usr/hdp/2.5.3.0-37/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 139, in load
File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/usr/hdp/2.5.3.0-37/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o43.load.
: java.sql.SQLException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
Sat Mar 11 23:20:03 EST 2017, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68236: row 'SYSTEM:CATALOG,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=usor7dhc01w06.use.ucdp.net,16020,1488988574688, seqNum=0
at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2590)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2327)
at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:78)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2327)
at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:233)
at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:142)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:202)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:98)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:57)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:45)
at org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil.getSelectColumnMetadataList(PhoenixConfigurationUtil.java:277)
at org.apache.phoenix.spark.PhoenixRDD.toDataFrame(PhoenixRDD.scala:106)
at org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:57)
at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Created 03-13-2017 04:06 AM
Copy/Symlink the hbase-site.xml under /etc/spark/conf as below:
ln -s /etc/hbase/conf/hbase-site.xml /etc/spark/conf/hbase-site.xml
Once done execute the spark-submit as you had done earlier and then try again
Created 03-12-2017 05:04 AM
As you mentioned that it is secured environment. So i guess In your "spark-submit" command line argument you should pass the keytab & principal information:
Example:
--keytab /etc/security/keytabs/spark.headless.keytab --principal spark-XYZ@ABC.COM
.
Created 03-12-2017 05:12 AM
That doesnt work, I tried it
Created 03-13-2017 04:06 AM
Copy/Symlink the hbase-site.xml under /etc/spark/conf as below:
ln -s /etc/hbase/conf/hbase-site.xml /etc/spark/conf/hbase-site.xml
Once done execute the spark-submit as you had done earlier and then try again
Created 03-13-2017 04:55 AM
Worked. Thanks