Any job I run that involves HBase access results in the errors below. My own jobs are in Scala, but supplied python examples end the same. The cluster is running CDH 5.4.4. The same jobs run file on a different cluster with CDH 5.3.1. Any help is greatly apreciated!
15/08/15 21:46:30 WARN TableInputFormatBase: initializeTable called multiple times. Overwriting connection and table reference; TableInputFormatBase will not close these old references when done.
15/08/15 21:46:32 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, p-web-hdp-d12.nens.local): java.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous logs lines from the task's full log for more details.
Caused by: java.lang.IllegalStateException: The input format instance has not been properly initialized. Ensure you call initializeTable either in your constructor or initialize method
... 14 more
The stack trace is not complete but check for a mention of the HTrace class missing that was a big change in CDH 5.. Often that is the cause of issues.
Make sure you have added /opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar (check the path) to the
spark.[driver|executor].extraClassPath settings to fix that.
Second thing to check is make sure that you use the correct hbase jar from CDH and that you do not have any old versions on your classpath.
If that does not solve it provide more details