Created 03-12-2017 04:51 AM
from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext() sqlc = SQLContext(sc) df = sqlc.read \ .format("org.apache.phoenix.spark") \ .option("table", "TABLE1") \ .option("zkUrl", "<host>:2181:/hbase-secure") \ .load() print(df)
Run using:
spark-submit --master local --jars /usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.5.3.0-37.jar --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-client.jar" spark_phoenix.py
Error:
17/03/11 23:20:03 INFO ZooKeeper: Initiating client connection, connectString=<host>:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@653a27b2 17/03/11 23:20:03 INFO ClientCnxn: Opening socket connection to server <host>/10.18.106.4:2181. Will not attempt to authenticate using SASL (unknown error) 17/03/11 23:20:03 INFO ClientCnxn: Socket connection established to <host>/10.18.106.4:2181, initiating session 17/03/11 23:20:03 INFO ClientCnxn: Session establishment complete on server <host>/10.18.106.4:2181, sessionid = 0x15a7b6220540b7e, negotiated timeout = 40000 Traceback (most recent call last): File "/d3/app/bin/spark_phoenix.py", line 10, in <module> .option("zkUrl", "<host>:2181:/hbase-secure") \ File "/usr/hdp/2.5.3.0-37/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 139, in load File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__ File "/usr/hdp/2.5.3.0-37/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o43.load. : java.sql.SQLException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: Sat Mar 11 23:20:03 EST 2017, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68236: row 'SYSTEM:CATALOG,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=usor7dhc01w06.use.ucdp.net,16020,1488988574688, seqNum=0 at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2590) at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2327) at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:78) at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2327) at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:233) at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:142) at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:202) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:208) at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:98) at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:57) at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:45) at org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil.getSelectColumnMetadataList(PhoenixConfigurationUtil.java:277) at org.apache.phoenix.spark.PhoenixRDD.toDataFrame(PhoenixRDD.scala:106) at org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:57) at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745)
Created 03-13-2017 04:06 AM
Copy/Symlink the hbase-site.xml under /etc/spark/conf as below:
ln -s /etc/hbase/conf/hbase-site.xml /etc/spark/conf/hbase-site.xml
Once done execute the spark-submit as you had done earlier and then try again
Created 03-12-2017 05:04 AM
As you mentioned that it is secured environment. So i guess In your "spark-submit" command line argument you should pass the keytab & principal information:
Example:
--keytab /etc/security/keytabs/spark.headless.keytab --principal spark-XYZ@ABC.COM
.
Created 03-12-2017 05:12 AM
That doesnt work, I tried it
Created 03-13-2017 04:06 AM
Copy/Symlink the hbase-site.xml under /etc/spark/conf as below:
ln -s /etc/hbase/conf/hbase-site.xml /etc/spark/conf/hbase-site.xml
Once done execute the spark-submit as you had done earlier and then try again
Created 03-13-2017 04:55 AM
Worked. Thanks