Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

solr indexing from folder in hdfs

avatar
Expert Contributor

Hi,

I tried to index the files in a folder on HDFS; my solr configuration is the following:

./solr start -cloud -s ../server/solr -p 8983 -z 10.0.2.15:2181 -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.data.dir=hdfs://10.0.2.15:8020/user/solr -Dsolr.updatelog=hdfs://10.0.2.15:8020/user/solr

when I launch:

hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper -c Collezione -i /user/solr/documents -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk 10.0.2.15:2181/solr

I get the following error:

Solr server not available on: <a href="http://10.0.2.15:2181/solr">http://10.0.2.15:2181/solr</a>
Make sure that collection [Collezione] exists

The collection exists and is valid, but it looks like it is not able to contact the server.

I'd really appreciate some help in solving this problem.

Davide

1 ACCEPTED SOLUTION

avatar

@Davide Isoardi I was able to fix your problem, please try the following solution:

1)Create jaas-file, called jaas.conf

This file can be empty, doesnt really matter since your env. is not kerberized.

2) Start your Job with the following command

hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=jaas.conf -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper --collection test -i file:///data/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr

The order of the parameters needs to be the same as in the above command, otherwise the job might not work.

I believe this is a bug, could you please report this issue to Lucidworks? Thanks.

View solution in original post

12 REPLIES 12

avatar

Is your cluster kerberized?

I have seen this error a couple days ago and there was an important piece missing in the Solr documentation until now.

Your launch command should look similar to this:

hadoop jar
/opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar
com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=/opt/lucidworks-hdpsearch/solr/bin/jaas.conf -cls
com.lucidworks.hadoop.ingest.DirectoryIngestMapper  --collection MyCollection -i
hdfs://hortoncluster/data/* -of
com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect
horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr

Make sure you include the Jaas option in a kerberized enviornment: -Dlww.jaas.file=/opt/lucidworks-hdpsearch/solr/bin/jaas.conf

avatar
Expert Contributor

no, my cluster is not kerberized.

avatar
New Member

Hi Davide,

When indexing to solr cloud the zk list should contain all zookeeper instances + the zookeeper ensemble root directory if it was defined. I see in you call you have zk 10.0.2.15:2181/solr, Can you please respond if you have the root directory for ZK ensemble defined as solr? If not remove /solr with only the host being set and try indexing like so -zk 10.0.2.15:2181

Thanks

avatar
Expert Contributor

Hi,

my zookeeper paths are:

[zk: localhost:2181(CONNECTED) 5] ls /      
[configs, zookeeper, clusterstate.json, aliases.json, live_nodes, rmstore, overseer, overseer_elect, collections]
[zk: localhost:2181(CONNECTED) 6] ls /configs/mycollection
[currency.xml, protwords.txt, synonyms.txt, _rest_managed.json, solrconfig.xml, lang, stopwords.txt, schema.xml]
[zk: localhost:2181(CONNECTED) 7] ls /collections/mycollection
[state.json, leader_elect, leaders]

I created the collection whit run:

./solr create -c mycollection -d ../server/solr/configsets/basic_configs/

the command for indexing is almost different from previous:

yarn jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper -c mycollection -i /user/solr/documents -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk 10.0.2.15:2181

avatar

did that work?

avatar
Expert Contributor

solrCloud work but the command for indexing files in hdfs folder return:

Solr server not available on: <a href="http://10.0.2.15:2181">http://10.0.2.15:2181</a>
Make sure that collection [mycollection] exists

avatar

@Davide Isoardi I was able to fix your problem, please try the following solution:

1)Create jaas-file, called jaas.conf

This file can be empty, doesnt really matter since your env. is not kerberized.

2) Start your Job with the following command

hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=jaas.conf -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper --collection test -i file:///data/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr

The order of the parameters needs to be the same as in the above command, otherwise the job might not work.

I believe this is a bug, could you please report this issue to Lucidworks? Thanks.

avatar

@Davide Isoardi were you able to test the above solution?

avatar
Master Mentor

@Davide Isoardi are you still having issues with this? Can you accept best answer or provide your own solution?