In order to make it faster and easier to exchange data from/to Hadoop, a new data integration tool, Skool, has been developed by BT and made available as an open-source product to the wider community.
The code is available on the following URL:
https://github.com/BT-Plc/skool
This tool covers the following aspects:
- Seamless data transfer from Hadoop into a relational database
- Seamless data transfer from a relational database into Hadoop
- File transfer and Hive table creation for file based transfers into Hadoop
- Automatic generation and deployment of file creation scripts and jobs from Hadoop or Hivetables
Key Features
- The tool generates code which can be automatically executed (or scheduled) for delta and milestone replication with defined frequency of data refresh.
- The tool is configurable to select tables/columns/files which are to be transferred in or out of Hadoop.
- Inbuilt optimization of storage to deliver performant code – the tool takes into consideration table size, database partitions, file formats and compression.
Benefits of using Skool:
1. All scripts are provided to user and are customizable as needed
2. Code consistency is maintained
3. Effective logger information while running the application
4. Audit/Lineage recorded at every action in Hive Table
5. Custom Housekeeping
6. Support for both AVRO and Text files
7. Data can be imported from an Oracle database as well as from a server pushing down files to Hadoop
8. Compression over the stored data
9. Tables created over stored data
10. Automatic partitions over tables
11. Support for both incremental and milestone data pulls
12. Customized job scheduling