Support Questions
Find answers, ask questions, and share your expertise
Alert: The Cloudera Community will undergo maintenance on Saturday, August 17 at 12:00am PDT. See more info here.

Sqoop from DB2 when key is composite and non-numeric.


Sqoop from DB2 when key is composite and non-numeric.

New Contributor

We are in the very early stages of our integration with Hadoop and we have two key interests:


Ingestion into Hadoop from DB2 with Sqoop

We are heavy users of DB2. For the last month we have tried to ingest with Sqoop and the JDBC connector but what we are encountering is that Sqoop is apparently not able to perform correctly (i.e. use the full advantage of the parallelism) when the DB2 table we want to ingest has a (1) composite key where (2) the first field is not numeric, which, as it happens is the situation with all our DB2 tables. Apparently if we try to run several processes (say 10) in parallel to ingest a table with Sqoop quickly, but the key is not a single numeric field what it effectively does is run 10 full scans of the table and it doesn’t really gives any performance (whilst consuming MIPS like crazy.)


Optimised connection to DB2

In the same line, we are using a JDBC connector to connect DB2 to Sqoop and Hadoop to ingest and we were wondering if there is a DB2 optimised “direct” connector (or plans for it) that allows access to DB2 database files directly equivalent to MySql's mysqldump.


We have heard of vStorm enterprise solution and I wonder if anyone has experience with it or can recommend any other non-paid solution.