Support Questions

Find answers, ask questions, and share your expertise

Spark-HBase Connection(read/write) issue with Spark 2.3.1 and HBase 2.0.0 using pyspark

New Contributor

Hi Friends,

I am trying to read/write some data to HBase from Spark DataFrame using pyspark But am facing some issue,I am thinking that is version mismatch (tried with different combinations of hbase connectors, but not resolving the issue). If it is related to version incompatibility, Could anyone please share the correct spark-hbase connector version which is compatible with below mentioned HDP, HBase and Spark version? 

 

Please note the versions which i am using currently

Name                              Version 

Hortonworks(HDP)         3.0.1.0-187

HBase                             2.0.0

Spark2                            2.3.1

 

This is the sample pyspark code which i am trying  (example.py):

--------------------------------------------------------------------------------------------------------------------

from pyspark import SparkContext
from pyspark.sql import SQLContext

sc = SparkContext()
sqlc = SQLContext(sc)

data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'

df = sc.parallelize([('a', '1.0'), ('b', '2.0')]).toDF(schema=['col0', 'col1'])

# ''.join(string.split()) in order to write a multi-line JSON string here.
catalog = ''.join("""{
"table":{"namespace":"default", "name":"tblEmployee", "tableCoder":"PrimitiveType"},
"rowkey":"key",
"columns":{
"col0":{"cf":"rowkey", "col":"key", "type":"string"},
"col1":{"cf":"cf", "col":"col1", "type":"string"}
}
}""".split())


# Writing
df.write.options(catalog=catalog, newtable=5).format(data_source_format).save()

# Reading
df = sqlc.read.options(catalog=catalog, newtable=5).format(data_source_format).load()

-------------------------------------------------------------------------------------------------------------------

I am using below spark-submit command to execute my program

 

sudo spark-submit --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ --files /etc/hbase/3.0.1.0-187/0/hbase-site.xml example.py

--------------------------------------------------------------------------------------------------------------------------

Error Message which i am getting is as follows.

Traceback (most recent call last):
File "/home/ec2-user/src/example.py", line 23, in <module>
df.write.options(catalog=catalog, newtable=5).format(data_source_format).save()
File "/usr/lib/python2.7/site-packages/pyspark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 732, in save
File "/usr/lib/python2.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/lib/python2.7/site-packages/pyspark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/usr/lib/python2.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o64.save.
: java.lang.NoClassDefFoundError: org/apache/spark/Logging
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357).................

Thanks!

1 REPLY 1

Cloudera Employee

Hello Dhiwakar,

 

This error 

 


@Dhiwakar wrote:


: java.lang.NoClassDefFoundError: org/apache/spark/Logging

 

Is due to your application calling a logging function which is only available in Spark 1.5.2 and earlier. 

 

You can either revert to Spark 1.5.2 and execute the code, or update the code to execute against Spark 2.