Created on 04-01-2018 02:56 PM - edited 09-16-2022 06:03 AM
Created on 04-11-2018 06:05 AM - edited 04-11-2018 06:07 AM
If the question is academic in nature then certainly, you can.
If it's instead a use-case and if I were to choose between Sqoop and SparkSQL, I'd stick with Sqoop. The reason being Sqoop comes with a lot of connectors which it has direct access to, while Spark JDBC will typically be going in via plain old JDBC and so will be substantially slower and put more load on the target DB. You can also see partition size constraints while extracting data. So, performance and management would certainly be a key in deciding the solution.
Good Luck and let us know which one did you finally prefer and how was your experience. Thx
Created on 04-11-2018 06:05 AM - edited 04-11-2018 06:07 AM
If the question is academic in nature then certainly, you can.
If it's instead a use-case and if I were to choose between Sqoop and SparkSQL, I'd stick with Sqoop. The reason being Sqoop comes with a lot of connectors which it has direct access to, while Spark JDBC will typically be going in via plain old JDBC and so will be substantially slower and put more load on the target DB. You can also see partition size constraints while extracting data. So, performance and management would certainly be a key in deciding the solution.
Good Luck and let us know which one did you finally prefer and how was your experience. Thx
Created 04-19-2018 06:49 PM