Archives of Support Questions (Read Only)

isoardi · ‎12-10-2015

Hi,

I tried to index the files in a folder on HDFS; my solr configuration is the following:

./solr start -cloud -s ../server/solr -p 8983 -z 10.0.2.15:2181 -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.data.dir=hdfs://10.0.2.15:8020/user/solr -Dsolr.updatelog=hdfs://10.0.2.15:8020/user/solr

when I launch:

hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper -c Collezione -i /user/solr/documents -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk 10.0.2.15:2181/solr

I get the following error:

Solr server not available on: <a href="http://10.0.2.15:2181/solr">http://10.0.2.15:2181/solr</a>
Make sure that collection [Collezione] exists

The collection exists and is valid, but it looks like it is not able to contact the server.

I'd really appreciate some help in solving this problem.

Davide

jstraub · ‎12-13-2015

@Davide Isoardi I was able to fix your problem, please try the following solution:

1)Create jaas-file, called jaas.conf

This file can be empty, doesnt really matter since your env. is not kerberized.

2) Start your Job with the following command

hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=jaas.conf -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper --collection test -i file:///data/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr

The order of the parameters needs to be the same as in the above command, otherwise the job might not work.

I believe this is a bug, could you please report this issue to Lucidworks? Thanks.

View solution in original post

jstraub · ‎12-10-2015

Is your cluster kerberized?

I have seen this error a couple days ago and there was an important piece missing in the Solr documentation until now.

Your launch command should look similar to this:

hadoop jar
/opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar
com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=/opt/lucidworks-hdpsearch/solr/bin/jaas.conf -cls
com.lucidworks.hadoop.ingest.DirectoryIngestMapper  --collection MyCollection -i
hdfs://hortoncluster/data/* -of
com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect
horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr

Make sure you include the Jaas option in a kerberized enviornment: -Dlww.jaas.file=/opt/lucidworks-hdpsearch/solr/bin/jaas.conf

isoardi · ‎12-10-2015

no, my cluster is not kerberized.

acesir · ‎12-10-2015

Hi Davide,

When indexing to solr cloud the zk list should contain all zookeeper instances + the zookeeper ensemble root directory if it was defined. I see in you call you have zk 10.0.2.15:2181/solr, Can you please respond if you have the root directory for ZK ensemble defined as solr? If not remove /solr with only the host being set and try indexing like so -zk 10.0.2.15:2181

Thanks

isoardi · ‎12-11-2015

Hi,

my zookeeper paths are:

[zk: localhost:2181(CONNECTED) 5] ls /      
[configs, zookeeper, clusterstate.json, aliases.json, live_nodes, rmstore, overseer, overseer_elect, collections]
[zk: localhost:2181(CONNECTED) 6] ls /configs/mycollection
[currency.xml, protwords.txt, synonyms.txt, _rest_managed.json, solrconfig.xml, lang, stopwords.txt, schema.xml]
[zk: localhost:2181(CONNECTED) 7] ls /collections/mycollection
[state.json, leader_elect, leaders]

I created the collection whit run:

./solr create -c mycollection -d ../server/solr/configsets/basic_configs/

the command for indexing is almost different from previous:

yarn jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper -c mycollection -i /user/solr/documents -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk 10.0.2.15:2181

jstraub · ‎12-11-2015

did that work?

isoardi · ‎12-12-2015

solrCloud work but the command for indexing files in hdfs folder return:

Solr server not available on: <a href="http://10.0.2.15:2181">http://10.0.2.15:2181</a>
Make sure that collection [mycollection] exists

jstraub · ‎12-13-2015

@Davide Isoardi I was able to fix your problem, please try the following solution:

1)Create jaas-file, called jaas.conf

This file can be empty, doesnt really matter since your env. is not kerberized.

2) Start your Job with the following command

hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=jaas.conf -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper --collection test -i file:///data/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr

The order of the parameters needs to be the same as in the above command, otherwise the job might not work.

I believe this is a bug, could you please report this issue to Lucidworks? Thanks.

jstraub · ‎12-23-2015

@Davide Isoardi were you able to test the above solution?

aervits · ‎02-05-2016

@Davide Isoardi are you still having issues with this? Can you accept best answer or provide your own solution?

Cloudera Community

Archives of Support Questions (Read Only)

solr indexing from folder in hdfs