Support Questions
Find answers, ask questions, and share your expertise

should I use sqoop to export or import considering it's still using Map Reduce

Hi Guys,

I am considering sqoop to import/export data from RDBMS to/from HDFS. I found following issues with sqoop

  • it's still using Map Reduce as execution engine which is slowly dying creating
  • a number of mappers to speed up the execution is a tiresome process. Find a column which could be evenly distributed is not easy when you don't have a primary key (Netezza) or it's a combination of two columns
  • Hcatlog is not supported in Sqoop version 2.
  • Sqoop version is depreciated by Cloudera, is similar gonna happen with Hortonworks as well?
    https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cdh_ig_sqoop_vs_sqoop2.html

Could someone also tell me the roadmap of sqoop.

Should I consider writing a Spark Script which does the import/export? would it be faster?

1 REPLY 1

Sqoop is still the best tool around when it comes to fetching data from RDBMS to HDFS. Talking about issues listed by you one by one.

1. Sqoop uses MapReduce which is slow.

First, Sqoop spawns a Map Only job. And the sole operation by mapper is using the JDBC connector to connect to your RDBMS and fetch the data. Second, the data fetch use cases Sqoop is supposed to be used for are supposed to be batch oriented. Hence it's still handsomely efficient in doing whatever it does. Also, it is a tried and tested tool which has matured over a period of time.

2. Setting up the number of mappers is a one-time effort. Yes, you may have to put some efforts to find the best column for your data fetch operations but this effort will pay off over time with an efficient data transfer over JDBC connection 🙂

3. Stick to Sqoop 1. No major distribution has yet opted for Sqoop 2.

4. Even spark transfer data from RDBMS using JDBC connector. Which is exactly similar to what Sqoop is doing over all those years. And in Spark, you would need to distribute the data load manually, if at all you wanted to, which you can very easily achieve in Sqoop by simply mentioning -m/--num-mappers.

Hope that helps!

; ;