Created on 11-19-2014 02:15 PM - edited 09-16-2022 02:13 AM
Hi
Last week I was experimenting with Lily Hbase indexer serivice. I've downloaed the cloudera quick start VM and configured the morphline and the indexing seems to work well. However, when I manually install Hbase, Soloar cloud and the Lily indexer service - bear in mind that all these are downloaded from Cloudera download page - I get the error below.
I have two VMs set up as follow:
The command to add the Mapreduce job:
hadoop --config /etc/hadoop/conf jar /usr/lib/hbase-solr/tools/hbase-indexer-mr-1.5-cdh5.2.0-job.jar --conf /etc/hbase/conf/hbase-site.xml -Dmapred.child.java.opts=-Xmx500m --log4j /etc/hbase-solr/conf/log4j.properties --hbase-indexer-zk hbase1:2181,hbase2:2181,hbase3:2181 --hbase-indexer-file /etc/hbase-solr/conf/morphline-indexer-mapper.xml --hbase-indexer-name portalaudit --zk-host hbase1:2181,hbase2:2181,hbase3:2181/solr --collection portal-audit --go-live
This spits out lots of content, but when it gets to fiddling with the Jars Lily looks under hdfs:// for the Jars. There are a handful of posts on the Internet with the same problem, but none of them have decent answers. The only one with a possible answer suggests to upload the Jars into HDFS, but that feels wrong and is a complete workaround that will probably break at some point.
The exception when adding the Mapreduce job:
14/11/19 16:12:49 INFO zookeeper.ClientCnxn: EventThread shut down
14/11/19 16:12:49 INFO hadoop.ForkedMapReduceIndexerTool: Indexing data into 1 reducers
14/11/19 16:12:49 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
14/11/19 16:12:50 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/tmp/hadoop-root/mapred/staging/root850902700/.staging/job_local850902700_0001
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://3xNodeHA/usr/lib/hadoop/lib/guava-11.0.2.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1083)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1075)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1075)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
Ignoring the above error resut in failure to index the relevant fields in Solar. Any help is very much appreciated.
Thanks
Ayache
Created 11-21-2014 08:00 AM
It turned out that the hadoop Job below is only required for batch indexing.. so not needed for now. All seems to be working fine now.
Created 11-21-2014 08:00 AM
It turned out that the hadoop Job below is only required for batch indexing.. so not needed for now. All seems to be working fine now.