Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Are there any benchmarks for SQOOP data transfer rate ?

Solved Go to solution

Are there any benchmarks for SQOOP data transfer rate ?

Are there any benchmarks for SQOOP data transfers from an ORACE RDBMS to Hadoop cluster ?

Both Hadoop cluster and ORACLE servers are located in same datacenter and connected by 10G network and 10G TOR switches. What sort of data transfer rates I can really expect if I can run data transfer at a time when ORACLE servers are not being used by any other applications. I am able to get a rate of around ~200Mbps but I am not sure if that is the maximum that I can expect.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Are there any benchmarks for SQOOP data transfer rate ?

@Shishir Saxena

I don't think there is any benchmarks like that.

You can follow this http://www.slideshare.net/alxslva/effective-sqoop-best-practices-pitfalls-and-lessons-40370936

Also, make sure that you have stats generated on Oracle Tables.

Another link

Direct = True and number of mappers plays a big role.

Your setup looks really good as you have source and trage are in the same DC and 10G network is there.

7 REPLIES 7

Re: Are there any benchmarks for SQOOP data transfer rate ?

@Shishir Saxena

I don't think there is any benchmarks like that.

You can follow this http://www.slideshare.net/alxslva/effective-sqoop-best-practices-pitfalls-and-lessons-40370936

Also, make sure that you have stats generated on Oracle Tables.

Another link

Direct = True and number of mappers plays a big role.

Your setup looks really good as you have source and trage are in the same DC and 10G network is there.

Re: Are there any benchmarks for SQOOP data transfer rate ?

Re: Are there any benchmarks for SQOOP data transfer rate ?

Thanks Neeraj. This was useful, though I still don't have a benchmark. In Quest example, they were able to achieve 50GB table in 1000 sec for effective rate of 50Mbps.

I also found some info here

http://grokbase.com/t/sqoop/user/146jhv8577/sqoop-to-oracle-transfer-rates

and here

http://blog.cloudera.com/blog/2014/11/how-apache-sqoop-1-4-5-improves-oracle-databaseapache-hadoop-i...

In last case, it looks like 310GB table took only 100 seconds ( with around 25 mappers) in best case for a transfer rate of ~3.1 Gbps. That makes much more sense.

I will try to find out more details about my Oracle server configuration to see what else I can do to improve my performance.

Re: Are there any benchmarks for SQOOP data transfer rate ?

@Shishir Saxena Ok. I am going to share these numbers based on my experience..No official numbers

5 nodes cluster with 96GB , Dual 8 Core over 10G network from different datacenter

4 billion rows with 30 mappers = 40 mins

86 million rows ~ 12 mins

My best suggestion is to run a dummy test and based on that you can estimate the timings.

Re: Are there any benchmarks for SQOOP data transfer rate ?

Thank You Neeraj. I am running benchmarks on our cluster. Just wanted to understand what max upper limit I can target. Thank you again for quick response and so much help.

Re: Are there any benchmarks for SQOOP data transfer rate ?

Hi @Shishir Saxena, Oracle connector for Hadoop, the so-called Oraoop is included in Sqoop-1.4.5 and 1.4.6 (shipped with HDP-2.3.x). Sqoop user guide has a very detailed explanation here. It's enabled when "--direct" is used. Regarding benchmarks it's the best to build your own, for example using Sqoop with and without Oraoop with different number of mappers, various table sizes etc.

Re: Are there any benchmarks for SQOOP data transfer rate ?

Thanks. Looks like that is my only choice.

Don't have an account?
Coming from Hortonworks? Activate your account here