Support Questions

akhettar · ‎11-19-2014

Hi

Last week I was experimenting with Lily Hbase indexer serivice. I've downloaed the cloudera quick start VM and configured the morphline and the indexing seems to work well. However, when I manually install Hbase, Soloar cloud and the Lily indexer service - bear in mind that all these are downloaded from Cloudera download page - I get the error below.

I have two VMs set up as follow:

VM1: hbase1, hbase2 & hbase3 running Zookeeper, Hadoop, HBase, Mapreduce & Yarn
VM2: running SOLR & the Lily Indexer

The command to add the Mapreduce job:

hadoop --config /etc/hadoop/conf jar /usr/lib/hbase-solr/tools/hbase-indexer-mr-1.5-cdh5.2.0-job.jar --conf /etc/hbase/conf/hbase-site.xml -Dmapred.child.java.opts=-Xmx500m --log4j /etc/hbase-solr/conf/log4j.properties --hbase-indexer-zk hbase1:2181,hbase2:2181,hbase3:2181 --hbase-indexer-file /etc/hbase-solr/conf/morphline-indexer-mapper.xml --hbase-indexer-name portalaudit --zk-host hbase1:2181,hbase2:2181,hbase3:2181/solr --collection portal-audit --go-live

This spits out lots of content, but when it gets to fiddling with the Jars Lily looks under hdfs:// for the Jars. There are a handful of posts on the Internet with the same problem, but none of them have decent answers. The only one with a possible answer suggests to upload the Jars into HDFS, but that feels wrong and is a complete workaround that will probably break at some point.

The exception when adding the Mapreduce job:

14/11/19 16:12:49 INFO zookeeper.ClientCnxn: EventThread shut down
14/11/19 16:12:49 INFO hadoop.ForkedMapReduceIndexerTool: Indexing data into 1 reducers
14/11/19 16:12:49 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
14/11/19 16:12:50 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/tmp/hadoop-root/mapred/staging/root850902700/.staging/job_local850902700_0001
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://3xNodeHA/usr/lib/hadoop/lib/guava-11.0.2.jar
        at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1083)
        at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1075)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1075)
        at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
        at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
        at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
        at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
        at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
        at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)

Ignoring the above error resut in failure to index the relevant fields in Solar. Any help is very much appreciated.

Thanks

Ayache

akhettar · ‎11-21-2014

It turned out that the hadoop Job below is only required for batch indexing.. so not needed for now. All seems to be working fine now.

View solution in original post

akhettar · ‎11-21-2014

It turned out that the hadoop Job below is only required for batch indexing.. so not needed for now. All seems to be working fine now.

Cloudera Community

Support Questions

The command to add the Mapreduce job for Lily HBase NRT Indexer Service is failing