Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: The Cloudera Community will undergo maintenance on Saturday, August 17 at 12:00am PDT. See more info here.

Can Spark SQL replaces Sqoop for Data Ingestion?

SOLVED Go to solution

Can Spark SQL replaces Sqoop for Data Ingestion?

New Contributor
 
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Can Spark SQL replaces Sqoop for Data Ingestion?

Expert Contributor

If the question is academic in nature then certainly, you can.

If it's instead a use-case and if I were to choose between Sqoop and SparkSQL, I'd stick with Sqoop. The reason being Sqoop comes with a lot of connectors which it has direct access to, while Spark JDBC will typically be going in via plain old JDBC and so will be substantially slower and put more load on the target DB. You can also see partition size constraints while extracting data. So, performance and management would certainly be a key in deciding the solution. 

 

Good Luck and let us know which one did you finally prefer and how was your experience. Thx

2 REPLIES 2
Highlighted

Re: Can Spark SQL replaces Sqoop for Data Ingestion?

Expert Contributor

If the question is academic in nature then certainly, you can.

If it's instead a use-case and if I were to choose between Sqoop and SparkSQL, I'd stick with Sqoop. The reason being Sqoop comes with a lot of connectors which it has direct access to, while Spark JDBC will typically be going in via plain old JDBC and so will be substantially slower and put more load on the target DB. You can also see partition size constraints while extracting data. So, performance and management would certainly be a key in deciding the solution. 

 

Good Luck and let us know which one did you finally prefer and how was your experience. Thx

Re: Can Spark SQL replaces Sqoop for Data Ingestion?

New Contributor
Thanks a lot!

Finally, Sqoop.. :-)