Created on 02-02-2016 03:25 PM - edited 09-16-2022 03:01 AM
We would like an efficient, near realtime method to ingest mainframe data into HDFS. An open source solutions is preferred but will also accept 3rd party suggestions.
Created 02-02-2016 03:51 PM
Sqoop might work or if you want to be closer to realtime, IBM CDC. Change Data Capture from IBM also has a Hadoop connector I would hope they can talk to each other. ( Mainframe versions are sometimes very different but I would assume you can forward the changes to a normal CDC instance which then should have the BigData connector.)
http://www-03.ibm.com/software/products/en/ibminfochandatacaptforzos
Depending on how you want the data in HDFS there is for example a webhdfs connector in IBM CDC ( now InfoSphere Data replication? )
Created 02-02-2016 03:26 PM
@Scott Shaw I would leverage the FTP processors in Nifi most likely for near real time. Since it's DB2, then Attunity Replicate which happens to be a good partner with HWX or GoldenGate (PDF)most likely for CDC replication.
Created 02-02-2016 03:27 PM
Open source sqoop or Syncsort
Created 02-02-2016 03:58 PM
@Geoffrey Shelton Okot he requested near real time, depending on what near real time definition is, Sqoop granularity can be as low as 5min, anything less and jobs will step on each other or kill RDBMS.
Created 02-02-2016 03:49 PM
Please se this
Created 02-02-2016 03:51 PM
Sqoop might work or if you want to be closer to realtime, IBM CDC. Change Data Capture from IBM also has a Hadoop connector I would hope they can talk to each other. ( Mainframe versions are sometimes very different but I would assume you can forward the changes to a normal CDC instance which then should have the BigData connector.)
http://www-03.ibm.com/software/products/en/ibminfochandatacaptforzos
Depending on how you want the data in HDFS there is for example a webhdfs connector in IBM CDC ( now InfoSphere Data replication? )
Created 03-22-2017 10:05 AM
Happy to recommend Attunity Replicate for DB2. Need to deploy Attunity AIS onto the source server as well when dealing with mainframe systems though, but the footprint was minimal (after the complete load has happened, Replicate is just reading the DB logs after that point).
Have used with SQL Server as well (piece of cake once we met the pre-requisites on the source DB) and IMS (a lot more work due to the inherent complexities of hierarclical DB e.g. logical pairs, variants but we got it all working once we'd uncovered all the design 'features' inherent to the IMS DB's we connected to. Can write to HDFS or connect to Kafka but I never got a chance to try them (just wrote csv files to edge node) due to project constraints alas