08-16-2018 04:49 AM - edited 08-16-2018 05:25 AM
I am trying to work with kafka for data ingestion but being new to this, i kind of pretty much confused.
I have multiple crawlers, who extract data for me from web platforms, now the issue is i want to ingest that extracted data to hadoop using kafka without any middle scripts/service file . Main commplication is that, platforms are disparate in nature and one web platform is providing real-time data other batch based. Can integrate my crawlers some how with kafka producers ? and they keep running all by themselves. is it possible ? I think it is but i am not getting in right direction. any help would be appreciated.