take a look at nifi, you can sqoop into a spooling dir, have Kafka pick up from there on. Spark streaming in nifi already exists and Storm is going to be included soon. Rough idea of your last inquiry
Sqoop incremental into hdfs directory > watch hdfs dir with nifi > putKafka > Stormspark
You can also split to two pipes in nifi and join into one pipe from two
I would prefer Kafka only when data is pushing from an external system.
and another place where I use Kafka, pulled data will be used by multiple parties .so that each consumer connects to kafka topic.
when you have control to pull the data then you can go for custom receivers in Spark. pull what you can consume.
which avoids the extra overhead of maintaining Kafka cluster for balancing the load.
Currently we are implementing a POC in which we require to import real time data from RDBMS to Kafka using Attunity..How to implement the same