Created 06-21-2022 03:05 AM
Dear all,
I have a question on using a new set of data sources on the currently installed cloudera CDP platform in Ooredoo network.
Currently the OPEX optimizer uses Telco core and network access along with Charging System Data sources.
Now, we have an opportunity to add a new set of IT (non Telco) data sources that may contain offline batch data sources as well as streaming data (ex. For Fraud Management and Revenue Assurance depts), shall we :
If the second option is chosen, what are the components and tools that will be used for ingestion, store and process the new data sources, and what are the nodes that will be implied to implement them ?
can we dedicate some worker nodes working only for IT data sources for the same HDFS ? how is the configuration of the components and tools then ?
Best Regards,
mamoune,
Created 07-04-2022 05:03 PM
Hi @mamoune, you can inject multiple concurrent data source types to the Cloudera CDP platform but make sure you have an inbound connection configured apparently from the source to the destination CDP cluster. There are various components/connectors that are useful for both moving and transforming data from source systems. To use for ingestion, store, and process the new data sources, typically requires a considerable amount of planning, which is one of the challenges of data pipeline integration. For example, Cloudera Morphlines is an open-source framework that reduces the time and skills required to build or change Search indexing applications. A morphline is a rich configuration file that simplifies defining an ETL transformation chain. Use these chains to consume any kind of data from any data source, process the data, and load the results into Cloudera Search. Executing in a small, embeddable Java runtime system, morphlines can be used for near real-time applications as well as batch processing applications.
Created 07-04-2022 05:03 PM
Hi @mamoune, you can inject multiple concurrent data source types to the Cloudera CDP platform but make sure you have an inbound connection configured apparently from the source to the destination CDP cluster. There are various components/connectors that are useful for both moving and transforming data from source systems. To use for ingestion, store, and process the new data sources, typically requires a considerable amount of planning, which is one of the challenges of data pipeline integration. For example, Cloudera Morphlines is an open-source framework that reduces the time and skills required to build or change Search indexing applications. A morphline is a rich configuration file that simplifies defining an ETL transformation chain. Use these chains to consume any kind of data from any data source, process the data, and load the results into Cloudera Search. Executing in a small, embeddable Java runtime system, morphlines can be used for near real-time applications as well as batch processing applications.
Created 07-10-2022 10:25 PM
@mamoune, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,