Posts: 21
Registered: ‎12-24-2015

Solr indexing exception using CDH Cloudera Express 5.5.1

I am running a map reduce job that indexes a Solr collection , I use a shaded jar that has the the code for the job.


The job is running on a CDH cluster( Cloudera Express 5.5.1 ) and when the processing of the data completes and the data is indexed in the Solr collection , I see the following error/exception in the logs , I believe this is because of the incompatible solrj jar that comes with CDH 5.5.1 , when I run this locally using solrj4.6.0 this runs with out any error , I believe the Solrj version in CDH is 4.13.0 and it is taking precedence over the solrj jar contained in the shaded solrj jar.


I am not a 100% sure but this appears to occur only in the node that is both a NameNode and DataNode , nodes that are DataNode only in the cluster do not seem to have this issue.


How do I make sure that the jar contained in the shaded jar takes precedence over the Solr jars that come with CDH?


Error that appears in the log.


2017-01-08 02:06:37,909 INFO [FetcherThread] org.apache.nutch.fetcher.Fetcher: fetch of  failed with: java.lang.NoSuchMethodError: org.apache.solr.client.solrj.impl.LBHttpSolrServer.<init>(Lorg/apache/http/client/HttpClient;[Ljava/lang/String;)V
    at org.apache.solr.client.solrj.impl.CloudSolrServer.<init>(
    at com.dynaobject.DynaOCrawlerUtils.SolrCallbackForNXParser.<init>(
    at com.dynaobject.DynaOCrawlerUtils.SolrDynaOUtils.populateSolrIndexFromCurrentURL(
    at org.apache.nutch.fetcher.Fetcher$