Created 12-10-2015 02:05 PM
Hi,
I tried to index the files in a folder on HDFS; my solr configuration is the following:
./solr start -cloud -s ../server/solr -p 8983 -z 10.0.2.15:2181 -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.data.dir=hdfs://10.0.2.15:8020/user/solr -Dsolr.updatelog=hdfs://10.0.2.15:8020/user/solr
when I launch:
hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper -c Collezione -i /user/solr/documents -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk 10.0.2.15:2181/solr
I get the following error:
Solr server not available on: <a href="http://10.0.2.15:2181/solr">http://10.0.2.15:2181/solr</a> Make sure that collection [Collezione] exists
The collection exists and is valid, but it looks like it is not able to contact the server.
I'd really appreciate some help in solving this problem.
Davide
Created 12-13-2015 10:00 PM
@Davide Isoardi I was able to fix your problem, please try the following solution:
1)Create jaas-file, called jaas.conf
This file can be empty, doesnt really matter since your env. is not kerberized.
2) Start your Job with the following command
hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=jaas.conf -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper --collection test -i file:///data/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr
The order of the parameters needs to be the same as in the above command, otherwise the job might not work.
I believe this is a bug, could you please report this issue to Lucidworks? Thanks.
Created 03-08-2016 03:41 PM
Hello everyone!
I'm struggling with the same problem. I've installed Hortonworks 2.3 on 3 Machines using the Installation Guide, after that I've installed hdpsearch according to the docs too so the current state of my configs is pretty ootb. I can run propely all the steps but failing in this last one.
The collection exists, my cluster is not kerberized, I'm using all the zk instances, I've tried to run it without the /solr but nothing.
Update:
I've also followed the good practices to clean and chroot my SolrCloud following this post Best Practice: 'chroot' your Solr Cloud in ZooKeeper.
Still having the same issue when trying to index with the DirectoryIngestMapper:
Solr server not available on: 10.1.0.4:2181,10.1.0.5:2181,10.1.0.6:2181/solr Make sure that collection [boletines_cba] exists
Does anyone have some insight on how to solve this issue?
Best regards,
Created 03-11-2016 01:09 PM
I've been able to sort this problem. I had a wrong field in the initParams definition in the solrconfig.xml file. I detected this error in Solr's logs.
After fixing it, the MapReduce job started working. I wonder why this impact in DirectoryIngestMapper because I was using that Solr config in another envs for testing and I was able to index without problem with other requestHandlers. Seems that the Mapper class depends on that config at some point.
Regards,
Created 04-18-2016 08:35 PM
hi ,
the above code is working fine for me, thanks u,
but if some more documents are landed in the same hdfs directory for evry 1 hour, in that case what will the best solution to do index on only new documents which are located in hdfs