Created 11-26-2017 12:57 AM
hi,
I am trying to put together a data pipeline on HDP 2.6.3 sandbox.(docker) I am using pyspark with phoenix (4.7) and HBase.
I have installed phoenix project from maven and successfully created a table with test records. I can see data in Hbase as well.
Now i am trying to read data from the table using pyspark with the following code:
import phoenix from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext(appName="Phoenix test") sqlContext = SQLContext(sc) table = sqlContext.read.format("org.apache.phoenix.spark").option("table", "INPUT_TABLE").option("zkUrl", "localhost:2181:/hbase-unsecure").load()
phoenix ddl:
CREATE TABLE INPUT_TABLE (id BIGINT NOT NULL PRIMARY KEY, col1 VARCHAR, col2 INTEGER); UPSERT INTO INPUT_TABLE (id, col1, col2) VALUES (1, 'test_row_1',111); UPSERT INTO INPUT_TABLE (id, col1, col2) VALUES (2, 'test_row_2',111 );
call:
spark-submit --verbose --class org.apache.phoenix.spark --jars /usr/hdp/current/phoenix-server/phoenix-4.7.0.2.5.0.0-1245-client.jar http://repo.hortonworks.com/content/groups/public/ --files /etc/spark2/conf/hbase-site.xml phoenix_test.py
error_message:
Traceback (most recent call last): File "/root/hdp/process_data.py", line 42, in <module> .format(data_source_format)\ File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 593, in save File "/usr/lib/python2.6/site-packages/py4j-0.10.6-py2.6.egg/py4j/java_gateway.py", line 1160, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/usr/lib/python2.6/site-packages/py4j-0.10.6-py2.6.egg/py4j/protocol.py", line 320, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o55.save. : java.lang.UnsupportedOperationException: empty.tail
thanks in advance
Created 11-26-2017 06:02 PM
found the issue. It is exactly the same problem mentioned in the below link:
https://github.com/tweag/sparkle/issues/105
went back to spark 1.6 and it works
Created 11-26-2017 12:57 AM
copied phoenix client and server jars to spark2 and hbase folders as well
and also updated the spark conf file with classes
Created 11-26-2017 06:22 AM
Hi @John Doo,
Apparently the job is unable to pick the table from the zookeeper Znode you have provided.
you have given HBase Zookeeper Znode information for phoenix to retrieve the table information, can you please check the phoenix Znode by changing into just the zookeeper quorum(you can get the precise value from hbase-site.xml file to validate your zookeeper is running on localhost or sandbox.hortonworks.com)
on the other note, Phoenix columns are automatically casted into Capital letters (if you choose to create a view on top of HBase table), hence using the capital letters for both sides (HBase and Phonix side), alternately you may use quotes to resolve this too.
Hope this helps !!
Created 11-26-2017 03:25 PM
still the same see answer above. can it cause the problem that i installed phoenix on own on sandbox? i just realized that there is a phoenix enabler at hbase on ambari. i just restart my docker image from scratch and use the built in version. is there any config i should set or enough to enable it and restart hbase? which jar i should use at call the version number one or the phoenix-client.jar one? thanks in advance
checked with the built in phoenix service. the same issue
Created 11-26-2017 11:29 AM
i changed the localhost to sandbox.hortonworks.com (that i had in the conf file for zookeper)
code:
import phoenix from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext(appName="Phoenix loader") sqlContext = SQLContext(sc) table = sqlContext.read.format("org.apache.phoenix.spark").option("table", "INPUT_TABLE").option("zkUrl", "sandbox-hdp.hortonworks.com:2181:/hbase-unsecure").load()
but still getting the same error:
17/11/26 11:27:39 INFO MetricsSystemImpl: phoenix metrics system started Traceback (most recent call last): File "/root/hdp/phoenix_test2.py", line 8, in <module> table = sqlContext.read.format("org.apache.phoenix.spark").option("table", "INPUT_TABLE").option("zkUrl", "sandbox-hdp.hortonworks.com:2181:/hbase-unsecure").load() File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 165, in load File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o42.load. : java.lang.NoSuchMethodError: org.apache.spark.sql.SQLContext.createDataFrame(Lorg/apache/spark/rdd/RDD;Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/DataFrame; at org.apache.phoenix.spark.PhoenixRDD.toDataFrame(PhoenixRDD.scala:117)
Created 11-26-2017 03:43 PM
https://github.com/tweag/sparkle/issues/105
it does not seem a zookeeper issue for me. here they discuss about the same error message.
Created 11-26-2017 06:02 PM
found the issue. It is exactly the same problem mentioned in the below link:
https://github.com/tweag/sparkle/issues/105
went back to spark 1.6 and it works