Support Questions

isoboroff · ‎02-12-2014

I have a large pile of web pages in HBase that I'm trying to index into Cloudera Search following the online docs. I'm running the job like so:

hadoop jar /usr/lib/hbase-solr/tools/hbase-indexer-mr-1.3-search-1.1.0-job.jar --hbase-table-name clueweb12 --zk-host 192.168.0.1/solr --collection cw12 --morphline-file morphlines.conf --hbase-indexer-file morphline-hbase-mapper.xml --reducers 0

... and this runs just fine: documents are indexed following the morphline spec I gave it. Except, it's running everything as a local job on the machine I launched the job from. In other words, no mappers anywhere else on my cluster. Log messages from INFO mapred.LocalJobRunner. At this rate it'll take several months 😉

The cluster is working otherwise fine... MR and MRv2 jobs work, HDFS all ok, HBase fine, Solr fine, all on CDH4.5. I get an odd error message but it doesn't stop the job:

14/02/12 09:35:32 ERROR mapreduce.TableInputFormatBase: Cannot resolve the host name for /192.168.0.7 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '7.0.168.192.in-addr.arpa'

I don't know if this is a red herring or not. It shouldn't be happening... everything is using static IPs in /etc/hosts. And as I said everything otherwise is working, it's just that this particular jar won't run parallel.

How do I figure out why this job won't go MR?

Thanks,

Ian

isoboroff · ‎02-19-2014

I figured out my problem. I forgot to export HADOOP_MAPRED_HOME.

View solution in original post

isoboroff · ‎02-19-2014

I figured out my problem. I forgot to export HADOOP_MAPRED_HOME.

Cloudera Community

Support Questions

Batch indexing from HBase using MR