Created 02-29-2016 04:36 AM
i want crawl the web urls information using nutch and store the data in hbase db. any one can suggest for how to do this with some example. bcoz i am new one for nutch.
Created 02-29-2016 04:57 AM
@sivasaravanakumar k Off topic : http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_search/index.html
Nutch --> http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial
You can use the same for multinode cluster
Created 02-29-2016 04:57 AM
@sivasaravanakumar k Off topic : http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_search/index.html
Nutch --> http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial
You can use the same for multinode cluster
Created 02-29-2016 06:39 AM
i got this error message
[root@sandbox ~]# bin/nutch fetch 1456727546-2019589981
Exception in thread "main" java.lang.RuntimeException: job failed: name=apache-nutch-2.3.1.jar, jobid=job_local522155708_0001 at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120) at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:205) at org.apache.nutch.fetcher.FetcherJob.fetch(FetcherJob.java:251) at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:314) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.fetcher.FetcherJob.main(FetcherJob.java:322)
Created 02-29-2016 07:49 AM
@sivasaravanakumar k http://nutch.apache.org/
Recommender: Apache Hadoop 2.5.2
I highly recommend to take a look on this http://stackoverflow.com/questions/4269632/an-alternative-web-crawler-to-nutch
Nutch tutorial http://cs.boisestate.edu/~amit/research/nutch/Nutch-Hadoop-Cluster-Howto.html
Created 02-29-2016 05:00 AM
@sivasaravanakumar k FYI: Nutch is not part of HDP stack