Created 04-19-2016 04:58 PM
Some pointers on the performance comparison with sqoop will help.
Also things we can leverage on data governance and provenance if we use hdf over sqoop in such scenarios.
Example database can be mysql/oracle/sqlserver etc.
Synopsis: What will induce me to choose hdf over sqoop in medium sized (few terabytes) relational databases?
Thanks in advance.
Created 04-19-2016 05:32 PM
Nifi documentation seems to indicate around 50MB/s -100 MB/s transfer rates.
Nifi is useful if there are several source databases from which data needs to be extracted very frequently as it helps with monitoring and work flow maintenance. If some of this data needs to be routed to different tables based on a columns value for instance Nifi is a good choice as sqoop wont support this by default.
If data needs to moved to multiple destinations also Nifi is a good choice - for example , Land data in HDFS while moving a part of the data to Kafka/Storm or Spark - This is also a benefit of Nifi
Nifi can apply scheduling of these flows easily while in sqoop it has to be set up as a crontab or Control M etc.
Sqoop can use mappers in hadoop for faulttolerance and for parallelism and may achieve better rates.If deduplication etc is to be performed then Nifi becomes a choice for smaller data sizes. For large table loads Sqoop is a good choice.
Created 04-19-2016 05:32 PM
Nifi documentation seems to indicate around 50MB/s -100 MB/s transfer rates.
Nifi is useful if there are several source databases from which data needs to be extracted very frequently as it helps with monitoring and work flow maintenance. If some of this data needs to be routed to different tables based on a columns value for instance Nifi is a good choice as sqoop wont support this by default.
If data needs to moved to multiple destinations also Nifi is a good choice - for example , Land data in HDFS while moving a part of the data to Kafka/Storm or Spark - This is also a benefit of Nifi
Nifi can apply scheduling of these flows easily while in sqoop it has to be set up as a crontab or Control M etc.
Sqoop can use mappers in hadoop for faulttolerance and for parallelism and may achieve better rates.If deduplication etc is to be performed then Nifi becomes a choice for smaller data sizes. For large table loads Sqoop is a good choice.
Created 04-20-2016 02:32 AM
Thanks you nice insights.