We are in the very early stages of our integration with Hadoop and we have two key interests:
Ingestion into Hadoop from DB2 with Sqoop
We are heavy users of DB2. For the last month we have tried to ingest with Sqoop and the JDBC connector but what we are encountering is that Sqoop is apparently not able to perform correctly (i.e. use the full advantage of the parallelism) when the DB2 table we want to ingest has a (1) composite key where (2) the first field is not numeric, which, as it happens is the situation with all our DB2 tables. Apparently if we try to run several processes (say 10) in parallel to ingest a table with Sqoop quickly, but the key is not a single numeric field what it effectively does is run 10 full scans of the table and it doesn’t really gives any performance (whilst consuming MIPS like crazy.)
Optimised connection to DB2
In the same line, we are using a JDBC connector to connect DB2 to Sqoop and Hadoop to ingest and we were wondering if there is a DB2 optimised “direct” connector (or plans for it) that allows access to DB2 database files directly equivalent to MySql's mysqldump.
We have heard of vStorm enterprise solution and I wonder if anyone has experience with it or can recommend any other non-paid solution.