Support Questions

Find answers, ask questions, and share your expertise

solr indexing from folder in hdfs

avatar
Expert Contributor

Hi,

I tried to index the files in a folder on HDFS; my solr configuration is the following:

./solr start -cloud -s ../server/solr -p 8983 -z 10.0.2.15:2181 -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.data.dir=hdfs://10.0.2.15:8020/user/solr -Dsolr.updatelog=hdfs://10.0.2.15:8020/user/solr

when I launch:

hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper -c Collezione -i /user/solr/documents -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk 10.0.2.15:2181/solr

I get the following error:

Solr server not available on: <a href="http://10.0.2.15:2181/solr">http://10.0.2.15:2181/solr</a>
Make sure that collection [Collezione] exists

The collection exists and is valid, but it looks like it is not able to contact the server.

I'd really appreciate some help in solving this problem.

Davide

1 ACCEPTED SOLUTION

avatar

@Davide Isoardi I was able to fix your problem, please try the following solution:

1)Create jaas-file, called jaas.conf

This file can be empty, doesnt really matter since your env. is not kerberized.

2) Start your Job with the following command

hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=jaas.conf -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper --collection test -i file:///data/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr

The order of the parameters needs to be the same as in the above command, otherwise the job might not work.

I believe this is a bug, could you please report this issue to Lucidworks? Thanks.

View solution in original post

12 REPLIES 12

avatar

Is your cluster kerberized?

I have seen this error a couple days ago and there was an important piece missing in the Solr documentation until now.

Your launch command should look similar to this:

hadoop jar
/opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar
com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=/opt/lucidworks-hdpsearch/solr/bin/jaas.conf -cls
com.lucidworks.hadoop.ingest.DirectoryIngestMapper  --collection MyCollection -i
hdfs://hortoncluster/data/* -of
com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect
horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr

Make sure you include the Jaas option in a kerberized enviornment: -Dlww.jaas.file=/opt/lucidworks-hdpsearch/solr/bin/jaas.conf

avatar
Expert Contributor

no, my cluster is not kerberized.

avatar
Explorer

Hi Davide,

When indexing to solr cloud the zk list should contain all zookeeper instances + the zookeeper ensemble root directory if it was defined. I see in you call you have zk 10.0.2.15:2181/solr, Can you please respond if you have the root directory for ZK ensemble defined as solr? If not remove /solr with only the host being set and try indexing like so -zk 10.0.2.15:2181

Thanks

avatar
Expert Contributor

Hi,

my zookeeper paths are:

[zk: localhost:2181(CONNECTED) 5] ls /      
[configs, zookeeper, clusterstate.json, aliases.json, live_nodes, rmstore, overseer, overseer_elect, collections]
[zk: localhost:2181(CONNECTED) 6] ls /configs/mycollection
[currency.xml, protwords.txt, synonyms.txt, _rest_managed.json, solrconfig.xml, lang, stopwords.txt, schema.xml]
[zk: localhost:2181(CONNECTED) 7] ls /collections/mycollection
[state.json, leader_elect, leaders]

I created the collection whit run:

./solr create -c mycollection -d ../server/solr/configsets/basic_configs/

the command for indexing is almost different from previous:

yarn jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper -c mycollection -i /user/solr/documents -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk 10.0.2.15:2181

avatar

did that work?

avatar
Expert Contributor

solrCloud work but the command for indexing files in hdfs folder return:

Solr server not available on: <a href="http://10.0.2.15:2181">http://10.0.2.15:2181</a>
Make sure that collection [mycollection] exists

avatar

@Davide Isoardi I was able to fix your problem, please try the following solution:

1)Create jaas-file, called jaas.conf

This file can be empty, doesnt really matter since your env. is not kerberized.

2) Start your Job with the following command

hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=jaas.conf -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper --collection test -i file:///data/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr

The order of the parameters needs to be the same as in the above command, otherwise the job might not work.

I believe this is a bug, could you please report this issue to Lucidworks? Thanks.

avatar

@Davide Isoardi were you able to test the above solution?

avatar
Master Mentor

@Davide Isoardi are you still having issues with this? Can you accept best answer or provide your own solution?