Support Questions

Find answers, ask questions, and share your expertise

Storing dataframe into HBase using Spark

New Contributor

I am storing dataframe to hbase table from the pyspark dataframe in CDP7, following this example in readwriteHBase.py   , the components that I use are:

 

- Spark version 3.1.1
- Scala version 2.12.10
- shc-core-1.1.1-2.1-s_2.11.jar

 

The command that i use:

 

 

spark3-submit --packages com.hortonworks:shc-core:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/ --files /etc/hbase/conf/hbase-site.xml test-hbase3.py

 

 

 

However, I got this error, it is quite long that I need to put it in hastebin.com as below:
error-logfile 

 

error snippet:

 

 

Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/test-hbase3.py", line 45, in <module>
main()
File "/opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/test-hbase3.py", line 24, in main
writeDF.write.options(catalog=writeCatalog, newtable=5).format(dataSourceFormat).save()
File "/opt/cloudera/parcels/SPARK3-3.1.1.3.1.7270.0-253-1.p0.11638568/lib/spark3/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 1107, in save
File "/opt/cloudera/parcels/SPARK3-3.1.1.3.1.7270.0-253-1.p0.11638568/lib/spark3/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/opt/cloudera/parcels/SPARK3-3.1.1.3.1.7270.0-253-1.p0.11638568/lib/spark3/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/opt/cloudera/parcels/SPARK3-3.1.1.3.1.7270.0-253-1.p0.11638568/lib/spark3/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o36.save.
: java.lang.NoClassDefFoundError: scala/Product$class
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.<init>(HBaseRelation.scala:73)
at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:59)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)

 

 

 

What should I do to fix the error? I tried to find other connector. However, only found SHC connector. Im not using any Maven repo here. But, not sure if there is missing dependencies or other error.

0 REPLIES 0
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.