Support Questions

Find answers, ask questions, and share your expertise

Are there any benchmarks for SQOOP data transfer rate ?

avatar

Are there any benchmarks for SQOOP data transfers from an ORACE RDBMS to Hadoop cluster ?

Both Hadoop cluster and ORACLE servers are located in same datacenter and connected by 10G network and 10G TOR switches. What sort of data transfer rates I can really expect if I can run data transfer at a time when ORACLE servers are not being used by any other applications. I am able to get a rate of around ~200Mbps but I am not sure if that is the maximum that I can expect.

1 ACCEPTED SOLUTION

avatar
Master Mentor
@Shishir Saxena

I don't think there is any benchmarks like that.

You can follow this http://www.slideshare.net/alxslva/effective-sqoop-best-practices-pitfalls-and-lessons-40370936

Also, make sure that you have stats generated on Oracle Tables.

Another link

Direct = True and number of mappers plays a big role.

Your setup looks really good as you have source and trage are in the same DC and 10G network is there.

View solution in original post

7 REPLIES 7

avatar
Master Mentor
@Shishir Saxena

I don't think there is any benchmarks like that.

You can follow this http://www.slideshare.net/alxslva/effective-sqoop-best-practices-pitfalls-and-lessons-40370936

Also, make sure that you have stats generated on Oracle Tables.

Another link

Direct = True and number of mappers plays a big role.

Your setup looks really good as you have source and trage are in the same DC and 10G network is there.

avatar
Master Mentor

avatar

Thanks Neeraj. This was useful, though I still don't have a benchmark. In Quest example, they were able to achieve 50GB table in 1000 sec for effective rate of 50Mbps.

I also found some info here

http://grokbase.com/t/sqoop/user/146jhv8577/sqoop-to-oracle-transfer-rates

and here

http://blog.cloudera.com/blog/2014/11/how-apache-sqoop-1-4-5-improves-oracle-databaseapache-hadoop-i...

In last case, it looks like 310GB table took only 100 seconds ( with around 25 mappers) in best case for a transfer rate of ~3.1 Gbps. That makes much more sense.

I will try to find out more details about my Oracle server configuration to see what else I can do to improve my performance.

avatar
Master Mentor

@Shishir Saxena Ok. I am going to share these numbers based on my experience..No official numbers

5 nodes cluster with 96GB , Dual 8 Core over 10G network from different datacenter

4 billion rows with 30 mappers = 40 mins

86 million rows ~ 12 mins

My best suggestion is to run a dummy test and based on that you can estimate the timings.

avatar

Thank You Neeraj. I am running benchmarks on our cluster. Just wanted to understand what max upper limit I can target. Thank you again for quick response and so much help.

avatar
Master Guru

Hi @Shishir Saxena, Oracle connector for Hadoop, the so-called Oraoop is included in Sqoop-1.4.5 and 1.4.6 (shipped with HDP-2.3.x). Sqoop user guide has a very detailed explanation here. It's enabled when "--direct" is used. Regarding benchmarks it's the best to build your own, for example using Sqoop with and without Oraoop with different number of mappers, various table sizes etc.

avatar

Thanks. Looks like that is my only choice.