Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Suggestions for data ingestion to hadoop layers

Highlighted

Suggestions for data ingestion to hadoop layers

New Contributor
We are getting involved a landscape where data from SAP-BW will be sent to Hadoop-HDFS for further application needs.

Data from ADSOs will be pulled and stored in hadoop at the staging layer first and then Staging tables(hive) would be joined to propagate to Hadoop core layer.
Hadoop details: cloudera cdh, hive,impala, spark 1.6
Suggestions needed:
A. Any suggestions what would be the best option to get the data from about 300 SAP ADSOs to Hadoop on a daily basis.

B. Based on aapplication needs, multiple stage tables(adsos) are required to be joined to form one core table. For example 1 core table might be formed from 10-15 stage tables(ADSOs). And we have arround 80 core tables with such scenarios.

Considering the amount of joins to be performed and unavailability of spark 2.0 or later, what would be the best possible options we should opt for? Hive route would be slower and expectations are on the faster side of the etl turnaround.

Any suggestions would be helpful.