Support Questions

Find answers, ask questions, and share your expertise

2 concurrent data sources types for the same cdp

New Contributor

Dear all,

 

I have a question on using a new set of data sources on the currently installed cloudera CDP platform in Ooredoo network.

 

Currently the OPEX optimizer uses Telco core and network access along with Charging System Data sources.

 

Now, we have an opportunity to add a new set of IT (non Telco) data sources that may contain offline batch data sources as well as streaming data (ex. For Fraud Management and Revenue Assurance depts), shall we :

 

  • Install a new Cloudera CDP platform fully dedicated to the new data sources (IT data source).
  • Or use the existing CDP platform ?

 

If the second option is chosen, what are the components and tools that will be used for ingestion, store and process the new data sources, and what are the nodes that will be implied to implement them ?

 

can we dedicate some worker nodes working only for IT data sources for the same HDFS ? how is the configuration of the components and tools then ?

 

Best Regards,

 

mamoune,

1 ACCEPTED SOLUTION

Expert Contributor

Hi @mamoune, you can inject multiple concurrent data source types to the Cloudera CDP platform but make sure you have an inbound connection configured apparently from the source to the destination CDP cluster. There are various components/connectors that are useful for both moving and transforming data from source systems. To use for ingestion, store, and process the new data sources, typically requires a considerable amount of planning, which is one of the challenges of data pipeline integration. For example, Cloudera Morphlines is an open-source framework that reduces the time and skills required to build or change Search indexing applications. A morphline is a rich configuration file that simplifies defining an ETL transformation chain. Use these chains to consume any kind of data from any data source, process the data, and load the results into Cloudera Search. Executing in a small, embeddable Java runtime system, morphlines can be used for near real-time applications as well as batch processing applications. 

View solution in original post

2 REPLIES 2

Expert Contributor

Hi @mamoune, you can inject multiple concurrent data source types to the Cloudera CDP platform but make sure you have an inbound connection configured apparently from the source to the destination CDP cluster. There are various components/connectors that are useful for both moving and transforming data from source systems. To use for ingestion, store, and process the new data sources, typically requires a considerable amount of planning, which is one of the challenges of data pipeline integration. For example, Cloudera Morphlines is an open-source framework that reduces the time and skills required to build or change Search indexing applications. A morphline is a rich configuration file that simplifies defining an ETL transformation chain. Use these chains to consume any kind of data from any data source, process the data, and load the results into Cloudera Search. Executing in a small, embeddable Java runtime system, morphlines can be used for near real-time applications as well as batch processing applications. 

Community Manager

@mamoune, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.