Created 02-10-2016 12:45 PM
Hi,
I'm looking to migrate 15 terrabytes of data into Hadoop and considereing FTP or SQOOP. Can anyone advise on the maximum volumes that SQOOP can handle as I've been told that its not normally used above 10Gb.
Thanks
Leigh
Created 02-10-2016 12:48 PM
The main question is "What is the source of data?"
if it's RDBMS then sqoop and answer is Yes..You can leverage sqoop to load 15TB of data
If it's not RDBMS then you should look into NiFi or Flume or if you just want to load data into HDFS then webhdfs
Created 02-10-2016 12:56 PM
Thanks. The migration is from Oracle so sounds like SQOOP will work fine.
Created 02-10-2016 01:13 PM
@Leigh Perkins Yes ..Now, most critical component is "Let your DBA know about this"
You are gold with Sqoop and Oracle marriage in this use case :)
Created 02-10-2016 01:20 PM
@Leigh Perkins Also, make sure that you designed your Hadoop cluster accordingly...storage and memory
http://www.slideshare.net/alxslva/effective-sqoop-best-practices-pitfalls-and-lessons-40370936
Created 02-10-2016 12:52 PM
limitation is on the database side not on sqoop
Created 02-10-2016 12:56 PM
Perfect, thanks.
Created 02-10-2016 12:59 PM
@Leigh Perkins make sure to limit batches otherwise you will kill your DB