Member since
02-16-2016
176
Posts
197
Kudos Received
17
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3057 | 11-18-2016 08:48 PM | |
5430 | 08-23-2016 04:13 PM | |
1492 | 03-26-2016 12:01 PM | |
1420 | 03-15-2016 12:12 AM | |
15232 | 03-14-2016 10:54 PM |
02-20-2016
11:05 PM
1 Kudo
Thanks Neeraj. This was useful, though I still don't have a benchmark. In Quest example, they were able to achieve 50GB table in 1000 sec for effective rate of 50Mbps. I also found some info here http://grokbase.com/t/sqoop/user/146jhv8577/sqoop-to-oracle-transfer-rates and here http://blog.cloudera.com/blog/2014/11/how-apache-sqoop-1-4-5-improves-oracle-databaseapache-hadoop-integration/ In last case, it looks like 310GB table took only 100 seconds ( with around 25 mappers) in best case for a transfer rate of ~3.1 Gbps. That makes much more sense. I will try to find out more details about my Oracle server configuration to see what else I can do to improve my performance.
... View more
02-20-2016
09:56 PM
2 Kudos
Are there any benchmarks for SQOOP data transfers from an ORACE RDBMS to Hadoop cluster ? Both Hadoop cluster and ORACLE servers are located in same datacenter and connected by 10G network and 10G TOR switches. What sort of data transfer rates I can really expect if I can run data transfer at a time when ORACLE servers are not being used by any other applications. I am able to get a rate of around ~200Mbps but I am not sure if that is the maximum that I can expect.
... View more
Labels:
- Labels:
-
Apache Sqoop
02-20-2016
08:32 PM
1 Kudo
Thanks Neeraj.
... View more
02-19-2016
03:36 PM
1 Kudo
@Neeraj Sabharwal @Artem Ervits Table exists in ORACLE database and user had access to table because I was able to use same sqoop command without --direct option. It had to do with SELECT_CATALOG_ROLE not granted to user. --direct option requires access to ORACLE catalog tables in addition to actual table.
... View more
02-19-2016
03:31 PM
2 Kudos
userid that is used to login to ORACLE should have SELECT_CATALOG_ROLE to use --direct option.
... View more
02-19-2016
03:29 PM
1 Kudo
While connecting to an ORACLE database with --direct option, gives error sqoop import --options-file db.config --table table1 --direct -m 4 ORA-00942: table or view does not exist Same sqoop command works fine without --direct option. sqoop import --options-file db.config --table table1 -m 4
... View more
Labels:
- Labels:
-
Apache Sqoop
02-19-2016
02:02 PM
Thanks @Artem Ervits. This is what I originally intended, but ExecuteProcess processor doesn't have any option to specify kerberos credentials.
... View more
02-19-2016
01:24 PM
1 Kudo
Thank you for above link.
... View more
02-19-2016
01:23 PM
Thank You for clarifying.
... View more
02-19-2016
06:00 AM
8 Kudos
There are 2 different ways of accessing HDFS over http. Using WebHDFS http://<active-namenode-server>:<namenode-port>/webhdfs/v1/<file-path>?op=OPEN Using HttpFs http://<hadoop-httpfs-server>:<httpfs-port>/webhdfs/v1/<file-path>?op=OPEN WebHDFS: Pros: Built-in with default Hadoop installation Efficient as load is streamed from each data node Cons: Does not work if high availability is enabled on cluster, Active namenode needs to be specified to use webHdfs HttpFs Pros: Works with HA enabled clusters. Cons: Needs to be installed as additional service. Impacts performance because data is streamed from single node. Creates single point of failure Additional performance implications of webHDFS vs HttpFs https://www.linkedin.com/today/post/article/20140717115238-176301000-accessing-hdfs-using-the-webhdfs-rest-api-vs-httpfs WebHDFS vs HttpFs Major difference between WebHDFS and HttpFs: WebHDFS needs access to all nodes of the cluster and when some data is read it is transmitted from that node directly, whereas in HttpFs, a singe node will act similar to a "gateway" and will be a single point of data transfer to the client node. So, HttpFs could be choked during a large file transfer but the good thing is that we are minimizing the footprint required to access HDFS.
... View more
Labels:
- « Previous
- Next »