Created 01-31-2017 04:46 AM
Hi, We have huge number of mainframe files, which are in EBCDIC format. these files are created by mainframe systems. These files are now stored in HDFS as EBCDIC files. I have a need to read these files, (copy books available), split them into multiple files based on record type, and store them as ASCII files in HDFS.
Created 01-31-2017 05:08 AM
You can use following project. It uses JRecord to do the conversion.
https://github.com/tmalaska/CopybookInputFormat
You can use Spark to read your EBCDIC files from hadoop and convert them to ASCII using above library.
Created 01-31-2017 08:06 PM
Hi Mqureshi
I'm very new to this, so i dont know how to do this. But i will try to check some online resources and try this. if i struggle i will come back and ask you for help. if it works, i will let you know about that also. Thanks.
Created 01-31-2017 06:28 PM
Sqoop has a connector to import Mainframe data into HDFS:
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_literal_sqoop_import_mainframe_literal
Created 01-31-2017 08:06 PM
But this will work only if the file in Mainframe is like a normal test file right. but in my case, the files are in EBCDIC format (has multiple occurrences), some junk values... so can we still do this with the Sqoop connector. i did go over the details in the link and couldnt see anything related to EBCDIC file. but if you think this is going to work, please share more details, and Im interested in knowing about this.
Created 01-31-2017 09:01 PM
The connector is a contribution from Syncsort. Syncsort has decades of experience with building tools for Mainframe data ingestion.
I have used Sqoop extensively; however, never for Mainframe data. Syncsort states: "Each data set will be stored as a separate HDFS file and EBCDIC encoded fixed length data will be stored as ASCII encoded variable length text on HDFS"
http://blog.syncsort.com/2014/06/big-data/big-iron-big-data-mainframe-hadoop-apache-sqoop/
There is also a Spark connector to import Mainframe data:
Created 01-31-2017 08:00 PM
Hi Mqureshi
I'm very new to this, so i dont know how to do this. But i will try to check some online resources and try this. if i struggle i will come back and ask you for help. if it works, i will let you know about that also. Thanks.
Created 01-31-2017 08:44 PM
fair enough. see the new answer by @bpreachuk. I was assuming you are loking for free tools but if you can get syncsort or if you already have it, that's the easiest way to do this.
Created 01-31-2017 08:40 PM
I am not sure if you have the ability to use a 3rd Party tool, bit one of our trusted partners is Syncsort. If you've used the mainframe before you'll know who they are. Dealing with EBCDIC conversions, Copybooks, etc. are features that they excel at and provide in their flagship tool. It's called DMX-h and it would do what you need (in fact it can be your Data Integtration tool for all data, not just mainframe). http://www.syncsort.com/en/Products/BigData/DMXh
Created 01-31-2017 09:40 PM
Hi Bpreachuk
Thanks for the answer. No, we do not have the option to buy syncsort.
Created 10-03-2017 08:20 PM
@karthick baskaran If you are still looking for an adaptor, ping me at arjun.mahajan@bitwiseglobal.com - we hav recently released a hadoop adaptor for mainframe date. Thanks.
Created 08-22-2018 07:05 PM
@Karthik Narayanan, you can use Cobrix to parse the EBCDIC files through Spark and stored them on HDFS using whatever format you want. It is open-source.
DISCLAIMER: I work for ABSA and I am one of the developers behind this library. Our focus has been: 1) ease of use, 2) performance.
Created 06-12-2020 08:41 AM
Hello,
Does COBRIX support Python ? I see only Scala api's..at https://github.com/AbsaOSS/cobrix
Please advice.
Thanks
Sreedhar Y
Created 09-13-2018 11:27 AM
@Binu Mathew Hi Binu, Have you been able to resolve your issue. If yes, could you please share the solution. I'm in the same boat.
Created 09-13-2018 05:23 PM
@Raghavendra Gupta, have you tried Cobrix?