<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: solr indexing from folder in hdfs in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98556#M11946</link>
    <description>&lt;P&gt;Hi Davide,&lt;/P&gt;&lt;P&gt;When indexing to solr cloud the zk list should contain all zookeeper instances + the zookeeper ensemble root directory if it was defined. I see in you call you have zk 10.0.2.15:2181/solr, Can you please respond if you have the root directory for ZK ensemble defined as solr? If not remove /solr with only the host being set and try indexing like so -zk 10.0.2.15:2181&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
    <pubDate>Thu, 10 Dec 2015 22:40:24 GMT</pubDate>
    <dc:creator>acesir</dc:creator>
    <dc:date>2015-12-10T22:40:24Z</dc:date>
    <item>
      <title>solr indexing from folder in hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98554#M11944</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I tried to index the files in a folder on HDFS; my solr configuration is the following:&lt;/P&gt;&lt;PRE&gt;./solr start -cloud -s ../server/solr -p 8983 -z 10.0.2.15:2181 -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.data.dir=hdfs://10.0.2.15:8020/user/solr -Dsolr.updatelog=hdfs://10.0.2.15:8020/user/solr&lt;/PRE&gt;&lt;P&gt;when I launch:&lt;/P&gt;&lt;PRE&gt;hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper -c Collezione -i /user/solr/documents -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk 10.0.2.15:2181/solr&lt;/PRE&gt;&lt;P&gt;I get the following error:&lt;/P&gt;&lt;PRE&gt;Solr server not available on: &amp;lt;a href="http://10.0.2.15:2181/solr"&amp;gt;http://10.0.2.15:2181/solr&amp;lt;/a&amp;gt;
Make sure that collection [Collezione] exists&lt;/PRE&gt;&lt;P&gt;The collection exists and is valid, but it looks like it is not able to contact the server.&lt;/P&gt;&lt;P&gt;I'd really appreciate some help in solving this problem.&lt;/P&gt;&lt;P&gt;Davide&lt;/P&gt;</description>
      <pubDate>Thu, 10 Dec 2015 22:05:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98554#M11944</guid>
      <dc:creator>isoardi</dc:creator>
      <dc:date>2015-12-10T22:05:43Z</dc:date>
    </item>
    <item>
      <title>Re: solr indexing from folder in hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98555#M11945</link>
      <description>&lt;P&gt;Is your cluster kerberized?&lt;/P&gt;&lt;P&gt;I have seen this error a couple days ago and there was an important piece missing in the Solr documentation until now.&lt;/P&gt;&lt;P&gt;Your launch command should look similar to this:&lt;/P&gt;&lt;PRE&gt;hadoop jar
/opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar
com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=/opt/lucidworks-hdpsearch/solr/bin/jaas.conf -cls
com.lucidworks.hadoop.ingest.DirectoryIngestMapper  --collection MyCollection -i
hdfs://hortoncluster/data/* -of
com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect
horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr&lt;/PRE&gt;&lt;P&gt;Make sure you include the Jaas option in a kerberized enviornment: &lt;EM&gt;-Dlww.jaas.file=/opt/lucidworks-hdpsearch/solr/bin/jaas.conf&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Dec 2015 22:34:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98555#M11945</guid>
      <dc:creator>jstraub</dc:creator>
      <dc:date>2015-12-10T22:34:53Z</dc:date>
    </item>
    <item>
      <title>Re: solr indexing from folder in hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98556#M11946</link>
      <description>&lt;P&gt;Hi Davide,&lt;/P&gt;&lt;P&gt;When indexing to solr cloud the zk list should contain all zookeeper instances + the zookeeper ensemble root directory if it was defined. I see in you call you have zk 10.0.2.15:2181/solr, Can you please respond if you have the root directory for ZK ensemble defined as solr? If not remove /solr with only the host being set and try indexing like so -zk 10.0.2.15:2181&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 10 Dec 2015 22:40:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98556#M11946</guid>
      <dc:creator>acesir</dc:creator>
      <dc:date>2015-12-10T22:40:24Z</dc:date>
    </item>
    <item>
      <title>Re: solr indexing from folder in hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98557#M11947</link>
      <description>&lt;P&gt;no, my cluster is not kerberized.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 00:02:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98557#M11947</guid>
      <dc:creator>isoardi</dc:creator>
      <dc:date>2015-12-11T00:02:30Z</dc:date>
    </item>
    <item>
      <title>Re: solr indexing from folder in hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98558#M11948</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;my zookeeper paths are:&lt;/P&gt;&lt;PRE&gt;[zk: localhost:2181(CONNECTED) 5] ls /      
[configs, zookeeper, clusterstate.json, aliases.json, live_nodes, rmstore, overseer, overseer_elect, collections]
[zk: localhost:2181(CONNECTED) 6] ls /configs/mycollection
[currency.xml, protwords.txt, synonyms.txt, _rest_managed.json, solrconfig.xml, lang, stopwords.txt, schema.xml]
[zk: localhost:2181(CONNECTED) 7] ls /collections/mycollection
[state.json, leader_elect, leaders]&lt;/PRE&gt;&lt;P&gt;I created the collection whit run:&lt;/P&gt;&lt;PRE&gt;./solr create -c mycollection -d ../server/solr/configsets/basic_configs/&lt;/PRE&gt;&lt;P&gt;the command for indexing is almost different from previous:&lt;/P&gt;&lt;PRE&gt;yarn jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper -c mycollection -i /user/solr/documents -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk 10.0.2.15:2181&lt;/PRE&gt;</description>
      <pubDate>Fri, 11 Dec 2015 21:26:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98558#M11948</guid>
      <dc:creator>isoardi</dc:creator>
      <dc:date>2015-12-11T21:26:30Z</dc:date>
    </item>
    <item>
      <title>Re: solr indexing from folder in hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98559#M11949</link>
      <description>&lt;P&gt;did that work?&lt;/P&gt;</description>
      <pubDate>Sat, 12 Dec 2015 01:55:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98559#M11949</guid>
      <dc:creator>jstraub</dc:creator>
      <dc:date>2015-12-12T01:55:25Z</dc:date>
    </item>
    <item>
      <title>Re: solr indexing from folder in hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98560#M11950</link>
      <description>&lt;P&gt;solrCloud work but the command for indexing files in hdfs folder return:&lt;/P&gt;&lt;PRE&gt;Solr server not available on: &amp;lt;a href="http://10.0.2.15:2181"&amp;gt;http://10.0.2.15:2181&amp;lt;/a&amp;gt;
Make sure that collection [mycollection] exists
&lt;/PRE&gt;</description>
      <pubDate>Sat, 12 Dec 2015 19:19:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98560#M11950</guid>
      <dc:creator>isoardi</dc:creator>
      <dc:date>2015-12-12T19:19:09Z</dc:date>
    </item>
    <item>
      <title>Re: solr indexing from folder in hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98561#M11951</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/5834/solr-indexing-from-folder-in-hdfs.html#"&gt;@Davide Isoardi&lt;/A&gt; I was able to fix your problem, please try the following solution:&lt;/P&gt;&lt;P&gt;1)Create jaas-file, called jaas.conf&lt;/P&gt;&lt;P&gt;&lt;EM&gt;This file can be empty, doesnt really matter since your env. is not kerberized.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;2) Start your Job with the following command&lt;/P&gt;&lt;PRE&gt;hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=jaas.conf -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper --collection test -i file:///data/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr&lt;/PRE&gt;&lt;P&gt;The order of the parameters needs to be the same as in the above command, otherwise the job might not work.&lt;/P&gt;&lt;P&gt;I believe this is a bug, could you please report this issue to Lucidworks? Thanks.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Dec 2015 06:00:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98561#M11951</guid>
      <dc:creator>jstraub</dc:creator>
      <dc:date>2015-12-14T06:00:02Z</dc:date>
    </item>
    <item>
      <title>Re: solr indexing from folder in hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98562#M11952</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1186/isoardi.html" nodeid="1186"&gt;@Davide Isoardi&lt;/A&gt; were you able to test the above solution?&lt;/P&gt;</description>
      <pubDate>Wed, 23 Dec 2015 15:40:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98562#M11952</guid>
      <dc:creator>jstraub</dc:creator>
      <dc:date>2015-12-23T15:40:14Z</dc:date>
    </item>
    <item>
      <title>Re: solr indexing from folder in hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98563#M11953</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1186/isoardi.html" nodeid="1186"&gt;@Davide Isoardi&lt;/A&gt; are you still having issues with this? Can you accept best answer or provide your own solution?&lt;/P&gt;</description>
      <pubDate>Sat, 06 Feb 2016 04:18:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98563#M11953</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-06T04:18:39Z</dc:date>
    </item>
    <item>
      <title>Re: solr indexing from folder in hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98564#M11954</link>
      <description>&lt;P&gt;Hello everyone! &lt;/P&gt;&lt;P&gt;I'm struggling with the same problem. I've installed Hortonworks 2.3 on 3 Machines using the Installation Guide, after that I've installed hdpsearch according to the docs too so the current state of my configs is pretty ootb. I can run propely all the steps but failing in this last one.&lt;/P&gt;&lt;P&gt;The collection exists, my cluster is not kerberized, I'm using all the zk instances, I've tried to run it without the /solr but nothing.&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Update:&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;I've also followed the good practices to clean and chroot my SolrCloud following this post &lt;A target="_blank" href="https://community.hortonworks.com/articles/7081/best-practice-chroot-your-solr-cloud-in-zookeeper.html"&gt;Best Practice: 'chroot' your Solr Cloud in ZooKeeper&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;Still having the same issue when trying to index with the &lt;STRONG&gt;DirectoryIngestMapper&lt;/STRONG&gt;:&lt;/P&gt;&lt;PRE&gt;Solr server not available on: 10.1.0.4:2181,10.1.0.5:2181,10.1.0.6:2181/solr
Make sure that collection [boletines_cba] exists&lt;/PRE&gt;&lt;P&gt;Does anyone have some insight on how to solve this issue?&lt;/P&gt;&lt;P&gt;Best regards,&lt;/P&gt;</description>
      <pubDate>Tue, 08 Mar 2016 23:41:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98564#M11954</guid>
      <dc:creator>pdelboca</dc:creator>
      <dc:date>2016-03-08T23:41:07Z</dc:date>
    </item>
    <item>
      <title>Re: solr indexing from folder in hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98565#M11955</link>
      <description>&lt;P&gt;I've been able to sort this problem. I had a wrong field in the initParams definition in the solrconfig.xml file. I detected this error in Solr's logs. &lt;/P&gt;&lt;P&gt;After fixing it, the MapReduce job started working. I wonder why this impact in DirectoryIngestMapper because I was using that Solr config in another envs for testing and I was able to index without problem with other requestHandlers. Seems that the Mapper class depends on that config at some point.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;</description>
      <pubDate>Fri, 11 Mar 2016 21:09:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98565#M11955</guid>
      <dc:creator>pdelboca</dc:creator>
      <dc:date>2016-03-11T21:09:21Z</dc:date>
    </item>
    <item>
      <title>Re: solr indexing from folder in hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98566#M11956</link>
      <description>&lt;P&gt;hi ,&lt;/P&gt;&lt;P&gt;the above code is working fine for me, thanks u,&lt;/P&gt;&lt;P&gt;but if some more documents are landed in the same hdfs directory for evry 1 hour, in that case what will the best solution to do index on only new documents which are located in hdfs &lt;/P&gt;</description>
      <pubDate>Tue, 19 Apr 2016 03:35:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/solr-indexing-from-folder-in-hdfs/m-p/98566#M11956</guid>
      <dc:creator>padala_srinivas</dc:creator>
      <dc:date>2016-04-19T03:35:13Z</dc:date>
    </item>
  </channel>
</rss>

