Support Questions

isoardi · ‎12-10-2015

Hi,

I tried to index the files in a folder on HDFS; my solr configuration is the following:

./solr start -cloud -s ../server/solr -p 8983 -z 10.0.2.15:2181 -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.data.dir=hdfs://10.0.2.15:8020/user/solr -Dsolr.updatelog=hdfs://10.0.2.15:8020/user/solr

when I launch:

hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper -c Collezione -i /user/solr/documents -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk 10.0.2.15:2181/solr

I get the following error:

Solr server not available on: <a href="http://10.0.2.15:2181/solr">http://10.0.2.15:2181/solr</a>
Make sure that collection [Collezione] exists

The collection exists and is valid, but it looks like it is not able to contact the server.

I'd really appreciate some help in solving this problem.

Davide

jstraub · ‎12-13-2015

@Davide Isoardi I was able to fix your problem, please try the following solution:

1)Create jaas-file, called jaas.conf

This file can be empty, doesnt really matter since your env. is not kerberized.

2) Start your Job with the following command

hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=jaas.conf -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper --collection test -i file:///data/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr

The order of the parameters needs to be the same as in the above command, otherwise the job might not work.

I believe this is a bug, could you please report this issue to Lucidworks? Thanks.

View solution in original post

pdelboca · ‎03-08-2016

Hello everyone!

I'm struggling with the same problem. I've installed Hortonworks 2.3 on 3 Machines using the Installation Guide, after that I've installed hdpsearch according to the docs too so the current state of my configs is pretty ootb. I can run propely all the steps but failing in this last one.

The collection exists, my cluster is not kerberized, I'm using all the zk instances, I've tried to run it without the /solr but nothing.

Update:

I've also followed the good practices to clean and chroot my SolrCloud following this post Best Practice: 'chroot' your Solr Cloud in ZooKeeper.

Still having the same issue when trying to index with the DirectoryIngestMapper:

Solr server not available on: 10.1.0.4:2181,10.1.0.5:2181,10.1.0.6:2181/solr
Make sure that collection [boletines_cba] exists

Does anyone have some insight on how to solve this issue?

Best regards,

pdelboca · ‎03-11-2016

I've been able to sort this problem. I had a wrong field in the initParams definition in the solrconfig.xml file. I detected this error in Solr's logs.

After fixing it, the MapReduce job started working. I wonder why this impact in DirectoryIngestMapper because I was using that Solr config in another envs for testing and I was able to index without problem with other requestHandlers. Seems that the Mapper class depends on that config at some point.

Regards,

padala_srinivas · ‎04-18-2016

hi ,

the above code is working fine for me, thanks u,

but if some more documents are landed in the same hdfs directory for evry 1 hour, in that case what will the best solution to do index on only new documents which are located in hdfs

Cloudera Community

Support Questions

solr indexing from folder in hdfs

Hbase indexing to Solr with HDP Search

Indexing Oracle tables into Apache Solr

Solr Indexing the database tables :

Sample HDF/NiFi flow to Push Tweets into Solr/Bana...

Re: Solr TTL - Auto-Purging Solr Documents & Range...

Mount VirtualBox Shared Folder

Setup Ambari Infra Solr to store indices on HDFS

Indexing Issue in HDFS

How to integrate Apache Spark with Solr Framework

Unable to delete solr indexes