Support Questions

ztalas1979 · ‎11-26-2017

hi,

I am trying to put together a data pipeline on HDP 2.6.3 sandbox.(docker) I am using pyspark with phoenix (4.7) and HBase.

I have installed phoenix project from maven and successfully created a table with test records. I can see data in Hbase as well.

Now i am trying to read data from the table using pyspark with the following code:

import phoenix 
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext(appName="Phoenix test")
sqlContext = SQLContext(sc)
table = sqlContext.read.format("org.apache.phoenix.spark").option("table", "INPUT_TABLE").option("zkUrl", "localhost:2181:/hbase-unsecure").load()

phoenix ddl:

CREATE TABLE INPUT_TABLE (id BIGINT NOT NULL PRIMARY KEY, col1 VARCHAR, col2 INTEGER);
UPSERT INTO INPUT_TABLE (id, col1, col2) VALUES (1, 'test_row_1',111);
UPSERT INTO INPUT_TABLE (id, col1, col2) VALUES (2, 'test_row_2',111 );

call:

spark-submit --verbose --class org.apache.phoenix.spark --jars /usr/hdp/current/phoenix-server/phoenix-4.7.0.2.5.0.0-1245-client.jar http://repo.hortonworks.com/content/groups/public/ --files /etc/spark2/conf/hbase-site.xml phoenix_test.py

error_message:

Traceback (most recent call last): File "/root/hdp/process_data.py", line 42, in <module> .format(data_source_format)\ File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 593, in save File "/usr/lib/python2.6/site-packages/py4j-0.10.6-py2.6.egg/py4j/java_gateway.py", line 1160, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/usr/lib/python2.6/site-packages/py4j-0.10.6-py2.6.egg/py4j/protocol.py", line 320, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o55.save. : java.lang.UnsupportedOperationException: empty.tail

thanks in advance

ztalas1979 · ‎11-26-2017

found the issue. It is exactly the same problem mentioned in the below link:

https://github.com/tweag/sparkle/issues/105

went back to spark 1.6 and it works

View solution in original post

ztalas1979 · ‎11-26-2017

copied phoenix client and server jars to spark2 and hbase folders as well

and also updated the spark conf file with classes

bkosaraju · ‎11-26-2017

Hi @John Doo,

Apparently the job is unable to pick the table from the zookeeper Znode you have provided.

you have given HBase Zookeeper Znode information for phoenix to retrieve the table information, can you please check the phoenix Znode by changing into just the zookeeper quorum(you can get the precise value from hbase-site.xml file to validate your zookeeper is running on localhost or sandbox.hortonworks.com)

on the other note, Phoenix columns are automatically casted into Capital letters (if you choose to create a view on top of HBase table), hence using the capital letters for both sides (HBase and Phonix side), alternately you may use quotes to resolve this too.

Hope this helps !!

ztalas1979 · ‎11-26-2017

still the same see answer above. can it cause the problem that i installed phoenix on own on sandbox? i just realized that there is a phoenix enabler at hbase on ambari. i just restart my docker image from scratch and use the built in version. is there any config i should set or enough to enable it and restart hbase? which jar i should use at call the version number one or the phoenix-client.jar one? thanks in advance

checked with the built in phoenix service. the same issue

ztalas1979 · ‎11-26-2017

i changed the localhost to sandbox.hortonworks.com (that i had in the conf file for zookeper)

code:

import phoenix
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext(appName="Phoenix loader")
sqlContext = SQLContext(sc)
table = sqlContext.read.format("org.apache.phoenix.spark").option("table", "INPUT_TABLE").option("zkUrl", "sandbox-hdp.hortonworks.com:2181:/hbase-unsecure").load()

but still getting the same error:

17/11/26 11:27:39 INFO MetricsSystemImpl: phoenix metrics system started Traceback (most recent call last): File "/root/hdp/phoenix_test2.py", line 8, in <module> table = sqlContext.read.format("org.apache.phoenix.spark").option("table", "INPUT_TABLE").option("zkUrl", "sandbox-hdp.hortonworks.com:2181:/hbase-unsecure").load() File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 165, in load File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o42.load. : java.lang.NoSuchMethodError: org.apache.spark.sql.SQLContext.createDataFrame(Lorg/apache/spark/rdd/RDD;Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/DataFrame; at org.apache.phoenix.spark.PhoenixRDD.toDataFrame(PhoenixRDD.scala:117)

ztalas1979 · ‎11-26-2017

https://github.com/tweag/sparkle/issues/105

it does not seem a zookeeper issue for me. here they discuss about the same error message.

ztalas1979 · ‎11-26-2017

found the issue. It is exactly the same problem mentioned in the below link:

https://github.com/tweag/sparkle/issues/105

went back to spark 1.6 and it works

Cloudera Community

Support Questions

Error reading/writing to Phoenix table from pyspark