<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Install and run Apache Nutch on existing Hadoop cluster in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Install-and-run-Apache-Nutch-on-existing-Hadoop-cluster/m-p/54701#M60968</link>
    <description>&lt;P&gt;Nutch is installed.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;FOr this I had to download ant and build the code. Make sure to set $JAVA_HOME correctly.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;[hdfs@X.X.X.X&amp;nbsp;apache-nutch-2.3.1]$ant runtime&lt;/PRE&gt;&lt;P&gt;As I had to setup it with MongoDB, so do these changes in&amp;nbsp;&lt;SPAN&gt;$NUTCH_HOME/conf/nutch-site.xml&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&amp;lt;configuration&amp;gt;
  &amp;lt;property&amp;gt;
    &amp;lt;name&amp;gt;storage.data.store.class&amp;lt;/name&amp;gt;
    &amp;lt;value&amp;gt;org.apache.gora.mongodb.store.MongoStore&amp;lt;/value&amp;gt;
    &amp;lt;description&amp;gt;Default class for storing data&amp;lt;/description&amp;gt;
  &amp;lt;/property&amp;gt;
&amp;lt;/configuration&amp;gt;&lt;/PRE&gt;&lt;P&gt;&lt;SPAN&gt;Ensure the MongoDB gora-mongodb dependency is available in $NUTCH_HOME/ivy/ivy.xml; Uncomment the below line from the file&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;$ vim $NUTCH_HOME/ivy/ivy.xml
...
&amp;lt;dependency org="org.apache.gora" name="gora-mongodb" rev="0.5" conf="*-&amp;gt;default" /&amp;gt;
...
&amp;lt;/dependency&amp;gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Also,&amp;nbsp;Ensure that MongoStore is set as the default datastore in $NUTCH_HOME/conf/gora.properties. Give all the details related to mongoDB.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Shilpa&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 12 May 2017 21:13:21 GMT</pubDate>
    <dc:creator>ShilpaSinha</dc:creator>
    <dc:date>2017-05-12T21:13:21Z</dc:date>
    <item>
      <title>Install and run Apache Nutch on existing Hadoop cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Install-and-run-Apache-Nutch-on-existing-Hadoop-cluster/m-p/54654#M60967</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have 3 node Cloudera cluster, running Cloudera 5.9. I want to make a web crawler and therefore want to Install Apache Nutch.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can anyone please guide me how to install on a Existing Hadoop Cluster(Hadoop version 2.6.0).&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have downloaded the tar from&amp;nbsp;&lt;A href="http://www.apache.org/dyn/closer.lua/nutch/2.3.1/apache-nutch-2.3.1-src.tar.gz" target="_blank"&gt;http://www.apache.org/dyn/closer.lua/nutch/2.3.1/apache-nutch-2.3.1-src.tar.gz&amp;nbsp;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;And extarcted the folder, but when I go inside, I see only these files:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;[hdfs@X.X.X.X&amp;nbsp;bin]$ pwd
/var/lib/hadoop-hdfs/nutch/apache-nutch-2.3.1/src/bin
[hdfs@X.X.X.X bin]$ ll
total 20
-rwxr-xr-x 1 hdfs hadoop 5453 Jan 10 2016 crawl
-rwxr-xr-x 1 hdfs hadoop 8801 Jan 10 2016 nutch

[hdfs@X.X.X.X&amp;nbsp;apache-nutch-2.3.1]$ ll
total 488
-rw-r--r-- 1 hdfs hadoop 46132 Jan 10 2016 build.xml
-rw-r--r-- 1 hdfs hadoop 82375 Jan 10 2016 CHANGES.txt
drwxr-xr-x 2 hdfs hadoop 4096 May 11 13:23 conf
-rw-r--r-- 1 hdfs hadoop 4903 Jan 10 2016 default.properties
drwxr-xr-x 3 hdfs hadoop 4096 Jan 10 2016 docs
drwxr-xr-x 2 hdfs hadoop 4096 May 11 13:23 ivy
drwxr-xr-x 3 hdfs hadoop 4096 Jan 10 2016 lib
-rw-r--r-- 1 hdfs hadoop 329066 Jan 10 2016 LICENSE.txt
-rw-r--r-- 1 hdfs hadoop 429 Jan 10 2016 NOTICE.txt
drwxr-xr-x 9 hdfs hadoop 4096 Jan 10 2016 src&lt;/PRE&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Shilpa&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 11:35:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Install-and-run-Apache-Nutch-on-existing-Hadoop-cluster/m-p/54654#M60967</guid>
      <dc:creator>ShilpaSinha</dc:creator>
      <dc:date>2022-09-16T11:35:39Z</dc:date>
    </item>
    <item>
      <title>Re: Install and run Apache Nutch on existing Hadoop cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Install-and-run-Apache-Nutch-on-existing-Hadoop-cluster/m-p/54701#M60968</link>
      <description>&lt;P&gt;Nutch is installed.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;FOr this I had to download ant and build the code. Make sure to set $JAVA_HOME correctly.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;[hdfs@X.X.X.X&amp;nbsp;apache-nutch-2.3.1]$ant runtime&lt;/PRE&gt;&lt;P&gt;As I had to setup it with MongoDB, so do these changes in&amp;nbsp;&lt;SPAN&gt;$NUTCH_HOME/conf/nutch-site.xml&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&amp;lt;configuration&amp;gt;
  &amp;lt;property&amp;gt;
    &amp;lt;name&amp;gt;storage.data.store.class&amp;lt;/name&amp;gt;
    &amp;lt;value&amp;gt;org.apache.gora.mongodb.store.MongoStore&amp;lt;/value&amp;gt;
    &amp;lt;description&amp;gt;Default class for storing data&amp;lt;/description&amp;gt;
  &amp;lt;/property&amp;gt;
&amp;lt;/configuration&amp;gt;&lt;/PRE&gt;&lt;P&gt;&lt;SPAN&gt;Ensure the MongoDB gora-mongodb dependency is available in $NUTCH_HOME/ivy/ivy.xml; Uncomment the below line from the file&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;$ vim $NUTCH_HOME/ivy/ivy.xml
...
&amp;lt;dependency org="org.apache.gora" name="gora-mongodb" rev="0.5" conf="*-&amp;gt;default" /&amp;gt;
...
&amp;lt;/dependency&amp;gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Also,&amp;nbsp;Ensure that MongoStore is set as the default datastore in $NUTCH_HOME/conf/gora.properties. Give all the details related to mongoDB.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Shilpa&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 12 May 2017 21:13:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Install-and-run-Apache-Nutch-on-existing-Hadoop-cluster/m-p/54701#M60968</guid>
      <dc:creator>ShilpaSinha</dc:creator>
      <dc:date>2017-05-12T21:13:21Z</dc:date>
    </item>
    <item>
      <title>Re: Install and run Apache Nutch on existing Hadoop cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Install-and-run-Apache-Nutch-on-existing-Hadoop-cluster/m-p/54896#M60969</link>
      <description>&lt;P&gt;Though Nutch is installed, It is NOT running on Hadoop. It is just installed on the VM.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can anyone help me in running Nutch on top of Existing Hadoop Cluster.??&lt;/P&gt;</description>
      <pubDate>Fri, 19 May 2017 19:18:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Install-and-run-Apache-Nutch-on-existing-Hadoop-cluster/m-p/54896#M60969</guid>
      <dc:creator>ShilpaSinha</dc:creator>
      <dc:date>2017-05-19T19:18:33Z</dc:date>
    </item>
    <item>
      <title>Re: Install and run Apache Nutch on existing Hadoop cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Install-and-run-Apache-Nutch-on-existing-Hadoop-cluster/m-p/58889#M60970</link>
      <description>&lt;P&gt;1. hadoop fs -put &amp;lt;url folder&amp;gt; &amp;lt;target&amp;gt;&lt;/P&gt;&lt;P&gt;2. hadoop jar &amp;lt;deployment-jar&amp;gt; &amp;lt;classname&amp;gt; other_params&lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2017 06:54:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Install-and-run-Apache-Nutch-on-existing-Hadoop-cluster/m-p/58889#M60970</guid>
      <dc:creator>M1030</dc:creator>
      <dc:date>2017-08-16T06:54:34Z</dc:date>
    </item>
  </channel>
</rss>

