02-12-2014 01:08 PM
I have a large pile of web pages in HBase that I'm trying to index into Cloudera Search following the online docs. I'm running the job like so:
hadoop jar /usr/lib/hbase-solr/tools/hbase-indexer-mr-1.3-search-1.1.0-job.jar --hbase-table-name clueweb12 --zk-host 192.168.0.1/solr --collection cw12 --morphline-file morphlines.conf --hbase-indexer-file morphline-hbase-mapper.xml --reducers 0
... and this runs just fine: documents are indexed following the morphline spec I gave it. Except, it's running everything as a local job on the machine I launched the job from. In other words, no mappers anywhere else on my cluster. Log messages from INFO mapred.LocalJobRunner. At this rate it'll take several months ;-)
The cluster is working otherwise fine... MR and MRv2 jobs work, HDFS all ok, HBase fine, Solr fine, all on CDH4.5. I get an odd error message but it doesn't stop the job:
14/02/12 09:35:32 ERROR mapreduce.TableInputFormatBase: Cannot resolve the host name for /192.168.0.7 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '22.214.171.124.in-addr.arpa'
I don't know if this is a red herring or not. It shouldn't be happening... everything is using static IPs in /etc/hosts. And as I said everything otherwise is working, it's just that this particular jar won't run parallel.
How do I figure out why this job won't go MR?