Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Class not found running Spark sample hbase_importformat.py in Quickstart VM

Class not found running Spark sample hbase_importformat.py in Quickstart VM

New Contributor

 

I am unable to execute the sample program hbase_inputformat.py found at /usr/lib/spark/examples/lib/python.tar.gz

 

I get the following error...

 

 at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
 ... 30 more
Caused by: java.lang.NoClassDefFoundError: org/apache/htrace/Trace
 at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:218)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:481)
 at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
 at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:86)
 at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:850)
 at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:635)
 ... 35 more
Caused by: java.lang.ClassNotFoundException: org.apache.htrace.Trace
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

 

I tried adding jar files but cannot resolve it.

 

[cloudera@quickstart ~]$ spark-submit --driver-class-path /usr/lib/spark/lib/spark-examples.jar hbase_inputformat.py master returns

 

[cloudera@quickstart ~]$ spark-submit --driver-class-path /usr/lib/spark/lib/spark-examples.jar --jars /usr/lib/hbase/hbase-it.jar,/usr/lib/hbase/hbase-rest.jar,/usr/lib/hbase/lib/htrace-core-3.1.0-incubating.jar,/usr/lib/hbase/lib/netty-3.6.6.Final.jar,/usr/lib/hbase/lib/netty-3.2.4.Final.jar hbase_inputformat.py master returns

 

[cloudera@quickstart ~]$ spark-submit --driver-class-path /usr/lib/spark/examples/lib/spark-examples-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar --jars /usr/lib/hbase/hbase-it.jar,/usr/lib/hbase/hbase-rest.jar,/usr/lib/hbase/lib/htrace-core-3.1.0-incubating.jar,/usr/lib/hbase/lib/netty-3.6.6.Final.jar,/usr/lib/hbase/lib/netty-3.2.4.Final.jar hbase_inputformat.py master returns

 

 

 

2 REPLIES 2

Re: Class not found running Spark sample hbase_importformat.py in Quickstart VM

New Contributor

Hi,

 

We are running into the same exact issue with CDH 5.4.1 . We upgraded from CDH 5.2 to CDH 5.4.1. It was working fine before. Please let me know if you found any resoulution to it.

 

 

We are using Spark on HBase in a kerberized environment.

Re: Class not found running Spark sample hbase_importformat.py in Quickstart VM

Super Collaborator

There has been a change in the indirect dependencies that get added by Spark. Spark itself has no dependency on HBase and thus wil not have any HBase jars on its path by default. The Hive integration does however and that used to give you all the classes to run a HBase application on Spark without the need to do anything. Hive and Hbase have changed and this is not the case any more.That is the cause of this "breakage".

 

However an application should not have relied on this indirect dependency loading of jars, and you need to add whatever you need to the classpath yourself.

 

This is the workaround for a customer to get this working (parcel based distribution using CM):

  • add the hbase jars to the executor classpath via the following setting:
    • login to ClouderaManager
    • go to the Spark on YARN service
    • go to the Configuration tab
    • type defaults in the search box
    • select gateway in the scope
    • add the entry:
      spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar
    • save the change and you will see an icon appear to deploy client configs (can take 30 sec to show)
    • deploy the client config
  • run the spark application accessing hbase by executing the following:
spark-submit --master yarn-cluster --driver-class-path /etc/hbase/conf:/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar ....

If you are not using CM you can make the changes manually as long as you make sure that the htrace jar (that specific version) is on the path.

 

Wilfred

 

Don't have an account?
Coming from Hortonworks? Activate your account here