Archives of Support Questions (Read Only)

hadoopsmi · ‎02-29-2016

i want crawl the web urls information using nutch and store the data in hbase db. any one can suggest for how to do this with some example. bcoz i am new one for nutch.

nsabharwal · ‎02-29-2016

@sivasaravanakumar k Off topic : http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_search/index.html

Nutch --> http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial

You can use the same for multinode cluster

View solution in original post

nsabharwal · ‎02-29-2016

@sivasaravanakumar k Off topic : http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_search/index.html

Nutch --> http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial

You can use the same for multinode cluster

hadoopsmi · ‎02-29-2016

i got this error message

[root@sandbox ~]# bin/nutch fetch 1456727546-2019589981

Exception in thread "main" java.lang.RuntimeException: job failed: name=apache-nutch-2.3.1.jar, jobid=job_local522155708_0001 at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120) at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:205) at org.apache.nutch.fetcher.FetcherJob.fetch(FetcherJob.java:251) at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:314) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.fetcher.FetcherJob.main(FetcherJob.java:322)

nsabharwal · ‎02-29-2016

@sivasaravanakumar k http://nutch.apache.org/

Recommender: Apache Hadoop 2.5.2

I highly recommend to take a look on this http://stackoverflow.com/questions/4269632/an-alternative-web-crawler-to-nutch

Nutch tutorial http://cs.boisestate.edu/~amit/research/nutch/Nutch-Hadoop-Cluster-Howto.html

nsabharwal · ‎02-29-2016

@sivasaravanakumar k FYI: Nutch is not part of HDP stack

Cloudera Community

Archives of Support Questions (Read Only)

nutch web crawling using hbase in hortonworks