Code Repositories
Find and share code repositories
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Super Guru
Repo Description

A web crawler bot written in Spark with Kafka and Tika to replace Nutch. It renders Javascript and processes files with Tika.

https://github.com/uscdataScience/sparkler/wiki/sparkler-0.1

Repo Info
Github Repo URL https://github.com/USCDataScience/sparkler
Github account name USCDataScience
Repo name sparkler
1,199 Views
0 Kudos
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎12-18-2016 03:00 AM
Updated by:
 
Contributors
Top Kudoed Authors