Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is the fastest method (best practice) for pulling mainframe DB2 System Z data into HDFS?

Solved Go to solution

What is the fastest method (best practice) for pulling mainframe DB2 System Z data into HDFS?

We would like an efficient, near realtime method to ingest mainframe data into HDFS. An open source solutions is preferred but will also accept 3rd party suggestions.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: What is the fastest method (best practice) for pulling mainframe DB2 System Z data into HDFS?

Sqoop might work or if you want to be closer to realtime, IBM CDC. Change Data Capture from IBM also has a Hadoop connector I would hope they can talk to each other. ( Mainframe versions are sometimes very different but I would assume you can forward the changes to a normal CDC instance which then should have the BigData connector.)

http://www-03.ibm.com/software/products/en/ibminfochandatacaptforzos

Depending on how you want the data in HDFS there is for example a webhdfs connector in IBM CDC ( now InfoSphere Data replication? )

https://www.ibm.com/developerworks/community/files/app?lang=en#/file/c04518fb-8ff3-4b2a-9fb9-3873347...

6 REPLIES 6

Re: What is the fastest method (best practice) for pulling mainframe DB2 System Z data into HDFS?

Mentor

@Scott Shaw I would leverage the FTP processors in Nifi most likely for near real time. Since it's DB2, then Attunity Replicate which happens to be a good partner with HWX or GoldenGate (PDF)most likely for CDC replication.

Re: What is the fastest method (best practice) for pulling mainframe DB2 System Z data into HDFS?

Mentor

Open source sqoop or Syncsort

Re: What is the fastest method (best practice) for pulling mainframe DB2 System Z data into HDFS?

Mentor

@Geoffrey Shelton Okot he requested near real time, depending on what near real time definition is, Sqoop granularity can be as low as 5min, anything less and jobs will step on each other or kill RDBMS.

Re: What is the fastest method (best practice) for pulling mainframe DB2 System Z data into HDFS?

@Scott Shaw

Please se this

Highlighted

Re: What is the fastest method (best practice) for pulling mainframe DB2 System Z data into HDFS?

Sqoop might work or if you want to be closer to realtime, IBM CDC. Change Data Capture from IBM also has a Hadoop connector I would hope they can talk to each other. ( Mainframe versions are sometimes very different but I would assume you can forward the changes to a normal CDC instance which then should have the BigData connector.)

http://www-03.ibm.com/software/products/en/ibminfochandatacaptforzos

Depending on how you want the data in HDFS there is for example a webhdfs connector in IBM CDC ( now InfoSphere Data replication? )

https://www.ibm.com/developerworks/community/files/app?lang=en#/file/c04518fb-8ff3-4b2a-9fb9-3873347...

Re: What is the fastest method (best practice) for pulling mainframe DB2 System Z data into HDFS?

New Contributor

Happy to recommend Attunity Replicate for DB2. Need to deploy Attunity AIS onto the source server as well when dealing with mainframe systems though, but the footprint was minimal (after the complete load has happened, Replicate is just reading the DB logs after that point).

Have used with SQL Server as well (piece of cake once we met the pre-requisites on the source DB) and IMS (a lot more work due to the inherent complexities of hierarclical DB e.g. logical pairs, variants but we got it all working once we'd uncovered all the design 'features' inherent to the IMS DB's we connected to. Can write to HDFS or connect to Kafka but I never got a chance to try them (just wrote csv files to edge node) due to project constraints alas