Support Questions

SQLShaw · ‎02-02-2016

We would like an efficient, near realtime method to ingest mainframe data into HDFS. An open source solutions is preferred but will also accept 3rd party suggestions.

bleonhardi · ‎02-02-2016

Sqoop might work or if you want to be closer to realtime, IBM CDC. Change Data Capture from IBM also has a Hadoop connector I would hope they can talk to each other. ( Mainframe versions are sometimes very different but I would assume you can forward the changes to a normal CDC instance which then should have the BigData connector.)

http://www-03.ibm.com/software/products/en/ibminfochandatacaptforzos

Depending on how you want the data in HDFS there is for example a webhdfs connector in IBM CDC ( now InfoSphere Data replication? )

https://www.ibm.com/developerworks/community/files/app?lang=en#/file/c04518fb-8ff3-4b2a-9fb9-3873347...

View solution in original post

aervits · ‎02-02-2016

@Scott Shaw I would leverage the FTP processors in Nifi most likely for near real time. Since it's DB2, then Attunity Replicate which happens to be a good partner with HWX or GoldenGate (PDF)most likely for CDC replication.

Shelton · ‎02-02-2016

Open source sqoop or Syncsort

aervits · ‎02-02-2016

@Geoffrey Shelton Okot he requested near real time, depending on what near real time definition is, Sqoop granularity can be as low as 5min, anything less and jobs will step on each other or kill RDBMS.

nsabharwal · ‎02-02-2016

@Scott Shaw

Please se this

bleonhardi · ‎02-02-2016

Sqoop might work or if you want to be closer to realtime, IBM CDC. Change Data Capture from IBM also has a Hadoop connector I would hope they can talk to each other. ( Mainframe versions are sometimes very different but I would assume you can forward the changes to a normal CDC instance which then should have the BigData connector.)

http://www-03.ibm.com/software/products/en/ibminfochandatacaptforzos

Depending on how you want the data in HDFS there is for example a webhdfs connector in IBM CDC ( now InfoSphere Data replication? )

https://www.ibm.com/developerworks/community/files/app?lang=en#/file/c04518fb-8ff3-4b2a-9fb9-3873347...

Murphy1979 · ‎03-22-2017

Happy to recommend Attunity Replicate for DB2. Need to deploy Attunity AIS onto the source server as well when dealing with mainframe systems though, but the footprint was minimal (after the complete load has happened, Replicate is just reading the DB logs after that point).

Have used with SQL Server as well (piece of cake once we met the pre-requisites on the source DB) and IMS (a lot more work due to the inherent complexities of hierarclical DB e.g. logical pairs, variants but we got it all working once we'd uncovered all the design 'features' inherent to the IMS DB's we connected to. Can write to HDFS or connect to Kafka but I never got a chance to try them (just wrote csv files to edge node) due to project constraints alas

Cloudera Community

Support Questions

What is the fastest method (best practice) for pulling mainframe DB2 System Z data into HDFS?