New Contributor
Posts: 8
Registered: ‎02-12-2014
Accepted Solution

Batch indexing from HBase using MR

I have a large pile of web pages in HBase that I'm trying to index into Cloudera Search following the online docs.  I'm running the job like so:


hadoop jar /usr/lib/hbase-solr/tools/hbase-indexer-mr-1.3-search-1.1.0-job.jar --hbase-table-name clueweb12 --zk-host --collection cw12 --morphline-file morphlines.conf --hbase-indexer-file morphline-hbase-mapper.xml --reducers 0


... and this runs just fine: documents are indexed following the morphline spec I gave it.  Except, it's running everything as a local job on the machine I launched the job from.  In other words, no mappers anywhere else on my cluster.  Log messages from INFO mapred.LocalJobRunner.  At this rate it'll take several months ;-)


The cluster is working otherwise fine... MR and MRv2 jobs work, HDFS all ok, HBase fine, Solr fine, all on CDH4.5.  I get an odd error message but it doesn't stop the job:


14/02/12 09:35:32 ERROR mapreduce.TableInputFormatBase: Cannot resolve the host name for / because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name ''

I don't know if this is a red herring or not.  It shouldn't be happening... everything is using static IPs in /etc/hosts.  And as I said everything otherwise is working, it's just that this particular jar won't run parallel.


How do I figure out why this job won't go MR?




New Contributor
Posts: 8
Registered: ‎02-12-2014

Re: Batch indexing from HBase using MR

I figured out my problem. I forgot to export HADOOP_MAPRED_HOME.