Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

nutch web crawling using hbase in hortonworks

avatar
Expert Contributor

i want crawl the web urls information using nutch and store the data in hbase db. any one can suggest for how to do this with some example. bcoz i am new one for nutch.

1 ACCEPTED SOLUTION

avatar
Master Mentor
4 REPLIES 4

avatar
Master Mentor

avatar
Expert Contributor

i got this error message

[root@sandbox ~]# bin/nutch fetch 1456727546-2019589981

Exception in thread "main" java.lang.RuntimeException: job failed: name=apache-nutch-2.3.1.jar, jobid=job_local522155708_0001 at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120) at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:205) at org.apache.nutch.fetcher.FetcherJob.fetch(FetcherJob.java:251) at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:314) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.fetcher.FetcherJob.main(FetcherJob.java:322)

avatar
Master Mentor

avatar
Master Mentor

@sivasaravanakumar k FYI: Nutch is not part of HDP stack