Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Explorer

How do SAP’s products integrate with Hadoop?

Broadly the tools that integrate with SAP fall into three categories

  • Tools that help you get data OUT of SAP
  • Tools that help you get data IN to SAP
  • Tools that help you QUERY data across SAP and Hadoop

Getting data OUT of SAP systems like ECC and HANA

  • Using SAP Data Store Objects
    • A DSO (DataStore Object) is known as the storage place to keep cleansed and consolidated transaction or master data at the lowest granularity level.
    • DSOs can be used a pass through for extraction from ECC. However, this comes at a price in terms of footprint, hardware, and complexity. Some extractions for deltas or real-time streaming won’t work with this architecture. Its a workable solution if we focus on the few tables and not the whole universe of 1000s of tables that SAP bundles.
  • SAP Specific Third Party Ingestion Tools
    • Datavard Glue
      • It integrates SAP and Hadoop closely by embedding the integration on the SAP side.
      • It is provides a lightweight ETL and workbench to identify relevant data, create data models on Hive, and populate them with data.Using a middleware component called “Storage Management” it connects SAP and Hadoop. Storage Management includes various technologies to bridge different access mechanisms: JDBC/ODBC, REST, SAP RFC, and others. Using Storage Management as a middleware, Datavard Glue allows to develop data models on Hadoop, to set up ETL processes to extract & transform data from SAP, to consume Hadoop data from SAP.
      • It supports all SAP products, especially the business applications built on Netweaver and HANA (both on premise and in the cloud).
      • Reference - http://www.datavard.com/en/blog-integrating-sap-with-hadoop-its-possible-but/
    • VirtDB
      • Offers a data distributor engine that basically executes ABAP reports in the background by scheduling it in the SAP Job Scheduler. The Data Distributor Engine connects to HDFS through WebHDFS client to upload CSV files of the extracted data.
  • Cloud based SAP solutions can be integrated via the Web service/REST APIs which the cloud offerings provide.
  • SAP specific Change data capture tools
    • Attunity Replicate for SAP - Its a data replication/integration solution that decodes SAP's complex, application-specific data structures and converts SAP data formats into new formats for your target Hadoop distribution, or big data analytics environment. Attunity Replicate is equipped with CDC to enable real-time data integration.
  • Plain old Sqoop
    • If you know exactly which tables in SAP you want to extract data out of then a Sqoop is the easiest way to bring data out.

Getting data INTO SAP systems

Querying Data in Hadoop from the SAP end

  • Both SAP HANA and SybaseIQ can federate queries across themselves and Hive.
  • Hive tables can be mapped as remote tables in SybaseIQ and a single query can then reference tables spanning the two systems.
  • Smart Data Access is a data virtualization feature in SAP HANA that allows customers to access data virtually from remote sources such as Hadoop, Oracle, Teradata, SQL Server and SAP databases and combine it with data that resides in an SAP HANA database.
  • SAP BusinessObjects BI supports data access to Apache Hive schemas, just like you connect to any other database. Along with this it gives the ability to combine it with data from SAP HANA and other non-SAP sources.
  • Reference - https://blogs.sap.com/2017/07/19/bridging-two-worlds-integration-of-sap-and-hadoop-ecosystems/
4,817 Views