Code Repositories
Find and share code repositories
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Super Guru
Repo Description

A web crawler bot written in Spark with Kafka and Tika to replace Nutch. It renders Javascript and processes files with Tika.

Repo Info
Github Repo URL
Github account name USCDataScience
Repo name sparkler
0 Kudos
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎12-18-2016 03:00 AM
Updated by:
Top Kudoed Authors