Intros and Suggestions

Introduce yourself to the community, provide feedback or participate in off topic discussions
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Skool – new open-source data integration tool for Hadoop

avatar
New Contributor

In order to make it faster and easier to exchange data from/to Hadoop, a new data integration tool, Skool, has been developed by BT and made available as an open-source product to the wider community.

 

The code is available on the following URL:

https://github.com/BT-Plc/skool

 

This tool covers the following aspects:

  • Seamless data transfer from Hadoop into a relational database
  • Seamless data transfer from a relational database into Hadoop
  • File transfer and Hive table creation for file based transfers into Hadoop
  • Automatic generation and deployment of file creation scripts and jobs from Hadoop or Hivetables

Key Features

  • The tool generates code which can be automatically executed (or scheduled) for delta and milestone replication with defined frequency of data refresh.
  • The tool is configurable to select tables/columns/files which are to be transferred in or out of Hadoop.
  • Inbuilt optimization of storage to deliver performant code – the tool takes into consideration table size, database partitions, file formats and compression.

Benefits of using Skool:

1. All scripts are provided to user and are customizable as needed

2. Code consistency is maintained

3. Effective logger information while running the application

4. Audit/Lineage recorded at every action in Hive Table

5. Custom Housekeeping

6. Support for both AVRO and Text files

7. Data can be imported from an Oracle database as well as from a server pushing down files to Hadoop

8. Compression over the stored data

9. Tables created over stored data

10. Automatic partitions over tables

11. Support for both incremental and milestone data pulls

12. Customized job scheduling

0 REPLIES 0