Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

nutch web crawling using hbase in hortonworks

avatar
Expert Contributor

i want crawl the web urls information using nutch and store the data in hbase db. any one can suggest for how to do this with some example. bcoz i am new one for nutch.

1 ACCEPTED SOLUTION

avatar
Master Mentor
4 REPLIES 4

avatar
Master Mentor

avatar
Expert Contributor

i got this error message

[root@sandbox ~]# bin/nutch fetch 1456727546-2019589981

Exception in thread "main" java.lang.RuntimeException: job failed: name=apache-nutch-2.3.1.jar, jobid=job_local522155708_0001 at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120) at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:205) at org.apache.nutch.fetcher.FetcherJob.fetch(FetcherJob.java:251) at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:314) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.fetcher.FetcherJob.main(FetcherJob.java:322)

avatar
Master Mentor

avatar
Master Mentor

@sivasaravanakumar k FYI: Nutch is not part of HDP stack