Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

Batch indexing from HBase using MR

avatar
Explorer

I have a large pile of web pages in HBase that I'm trying to index into Cloudera Search following the online docs.  I'm running the job like so:

 

hadoop jar /usr/lib/hbase-solr/tools/hbase-indexer-mr-1.3-search-1.1.0-job.jar --hbase-table-name clueweb12 --zk-host 192.168.0.1/solr --collection cw12 --morphline-file morphlines.conf --hbase-indexer-file morphline-hbase-mapper.xml --reducers 0

 

... and this runs just fine: documents are indexed following the morphline spec I gave it.  Except, it's running everything as a local job on the machine I launched the job from.  In other words, no mappers anywhere else on my cluster.  Log messages from INFO mapred.LocalJobRunner.  At this rate it'll take several months 😉

 

The cluster is working otherwise fine... MR and MRv2 jobs work, HDFS all ok, HBase fine, Solr fine, all on CDH4.5.  I get an odd error message but it doesn't stop the job:

 

14/02/12 09:35:32 ERROR mapreduce.TableInputFormatBase: Cannot resolve the host name for /192.168.0.7 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '7.0.168.192.in-addr.arpa'

I don't know if this is a red herring or not.  It shouldn't be happening... everything is using static IPs in /etc/hosts.  And as I said everything otherwise is working, it's just that this particular jar won't run parallel.

 

How do I figure out why this job won't go MR?

Thanks,

Ian

 

1 ACCEPTED SOLUTION

avatar
Explorer
I figured out my problem. I forgot to export HADOOP_MAPRED_HOME.

View solution in original post

1 REPLY 1

avatar
Explorer
I figured out my problem. I forgot to export HADOOP_MAPRED_HOME.
Labels