Support Questions

Find answers, ask questions, and share your expertise

What is the fastest method (best practice) for pulling mainframe DB2 System Z data into HDFS?

avatar

We would like an efficient, near realtime method to ingest mainframe data into HDFS. An open source solutions is preferred but will also accept 3rd party suggestions.

1 ACCEPTED SOLUTION

avatar
Master Guru

Sqoop might work or if you want to be closer to realtime, IBM CDC. Change Data Capture from IBM also has a Hadoop connector I would hope they can talk to each other. ( Mainframe versions are sometimes very different but I would assume you can forward the changes to a normal CDC instance which then should have the BigData connector.)

http://www-03.ibm.com/software/products/en/ibminfochandatacaptforzos

Depending on how you want the data in HDFS there is for example a webhdfs connector in IBM CDC ( now InfoSphere Data replication? )

https://www.ibm.com/developerworks/community/files/app?lang=en#/file/c04518fb-8ff3-4b2a-9fb9-3873347...

View solution in original post

6 REPLIES 6

avatar
Master Mentor

@Scott Shaw I would leverage the FTP processors in Nifi most likely for near real time. Since it's DB2, then Attunity Replicate which happens to be a good partner with HWX or GoldenGate (PDF)most likely for CDC replication.

avatar
Master Mentor

Open source sqoop or Syncsort

avatar
Master Mentor

@Geoffrey Shelton Okot he requested near real time, depending on what near real time definition is, Sqoop granularity can be as low as 5min, anything less and jobs will step on each other or kill RDBMS.

avatar
Master Mentor
@Scott Shaw

Please se this

avatar
Master Guru

Sqoop might work or if you want to be closer to realtime, IBM CDC. Change Data Capture from IBM also has a Hadoop connector I would hope they can talk to each other. ( Mainframe versions are sometimes very different but I would assume you can forward the changes to a normal CDC instance which then should have the BigData connector.)

http://www-03.ibm.com/software/products/en/ibminfochandatacaptforzos

Depending on how you want the data in HDFS there is for example a webhdfs connector in IBM CDC ( now InfoSphere Data replication? )

https://www.ibm.com/developerworks/community/files/app?lang=en#/file/c04518fb-8ff3-4b2a-9fb9-3873347...

avatar
New Contributor

Happy to recommend Attunity Replicate for DB2. Need to deploy Attunity AIS onto the source server as well when dealing with mainframe systems though, but the footprint was minimal (after the complete load has happened, Replicate is just reading the DB logs after that point).

Have used with SQL Server as well (piece of cake once we met the pre-requisites on the source DB) and IMS (a lot more work due to the inherent complexities of hierarclical DB e.g. logical pairs, variants but we got it all working once we'd uncovered all the design 'features' inherent to the IMS DB's we connected to. Can write to HDFS or connect to Kafka but I never got a chance to try them (just wrote csv files to edge node) due to project constraints alas