Created 12-10-2015 02:05 PM
Hi,
I tried to index the files in a folder on HDFS; my solr configuration is the following:
./solr start -cloud -s ../server/solr -p 8983 -z 10.0.2.15:2181 -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.data.dir=hdfs://10.0.2.15:8020/user/solr -Dsolr.updatelog=hdfs://10.0.2.15:8020/user/solr
when I launch:
hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper -c Collezione -i /user/solr/documents -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk 10.0.2.15:2181/solr
I get the following error:
Solr server not available on: <a href="http://10.0.2.15:2181/solr">http://10.0.2.15:2181/solr</a> Make sure that collection [Collezione] exists
The collection exists and is valid, but it looks like it is not able to contact the server.
I'd really appreciate some help in solving this problem.
Davide
Created 12-13-2015 10:00 PM
@Davide Isoardi I was able to fix your problem, please try the following solution:
1)Create jaas-file, called jaas.conf
This file can be empty, doesnt really matter since your env. is not kerberized.
2) Start your Job with the following command
hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=jaas.conf -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper --collection test -i file:///data/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr
The order of the parameters needs to be the same as in the above command, otherwise the job might not work.
I believe this is a bug, could you please report this issue to Lucidworks? Thanks.
Created 12-10-2015 02:34 PM
Is your cluster kerberized?
I have seen this error a couple days ago and there was an important piece missing in the Solr documentation until now.
Your launch command should look similar to this:
hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=/opt/lucidworks-hdpsearch/solr/bin/jaas.conf -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper --collection MyCollection -i hdfs://hortoncluster/data/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr
Make sure you include the Jaas option in a kerberized enviornment: -Dlww.jaas.file=/opt/lucidworks-hdpsearch/solr/bin/jaas.conf
Created 12-10-2015 04:02 PM
no, my cluster is not kerberized.
Created 12-10-2015 02:40 PM
Hi Davide,
When indexing to solr cloud the zk list should contain all zookeeper instances + the zookeeper ensemble root directory if it was defined. I see in you call you have zk 10.0.2.15:2181/solr, Can you please respond if you have the root directory for ZK ensemble defined as solr? If not remove /solr with only the host being set and try indexing like so -zk 10.0.2.15:2181
Thanks
Created 12-11-2015 01:26 PM
Hi,
my zookeeper paths are:
[zk: localhost:2181(CONNECTED) 5] ls / [configs, zookeeper, clusterstate.json, aliases.json, live_nodes, rmstore, overseer, overseer_elect, collections] [zk: localhost:2181(CONNECTED) 6] ls /configs/mycollection [currency.xml, protwords.txt, synonyms.txt, _rest_managed.json, solrconfig.xml, lang, stopwords.txt, schema.xml] [zk: localhost:2181(CONNECTED) 7] ls /collections/mycollection [state.json, leader_elect, leaders]
I created the collection whit run:
./solr create -c mycollection -d ../server/solr/configsets/basic_configs/
the command for indexing is almost different from previous:
yarn jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper -c mycollection -i /user/solr/documents -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk 10.0.2.15:2181
Created 12-11-2015 05:55 PM
did that work?
Created 12-12-2015 11:19 AM
solrCloud work but the command for indexing files in hdfs folder return:
Solr server not available on: <a href="http://10.0.2.15:2181">http://10.0.2.15:2181</a> Make sure that collection [mycollection] exists
Created 12-13-2015 10:00 PM
@Davide Isoardi I was able to fix your problem, please try the following solution:
1)Create jaas-file, called jaas.conf
This file can be empty, doesnt really matter since your env. is not kerberized.
2) Start your Job with the following command
hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=jaas.conf -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper --collection test -i file:///data/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr
The order of the parameters needs to be the same as in the above command, otherwise the job might not work.
I believe this is a bug, could you please report this issue to Lucidworks? Thanks.
Created 12-23-2015 07:40 AM
@Davide Isoardi were you able to test the above solution?
Created 02-05-2016 08:18 PM
@Davide Isoardi are you still having issues with this? Can you accept best answer or provide your own solution?