Support Questions

Find answers, ask questions, and share your expertise

Need to design system architecture

avatar
New Contributor

Hi Community,

      I have a system which reads data from source and gets enriched to further use it for business. 

I need 3 clusters, one with ETL capabilities(NiFi), the other with storage alone and third one where I can run my business using spark.

We have almost 10 Billion Load and abount 15,000 process in which we have about 10% of custom process as NiFi is not able to do custom lookup, custom sink, custom filter, custom mapper and so on which utilizes many threads.

Please also recommend me AZURE machine series for these 3 cluster too.

Thank You

Regards

Chetan K C

2 REPLIES 2

avatar
Community Manager

@Chetankc, Welcome to our community! To help you get the best possible answer, I have tagged in our experts @MattWho @steven-matison @cotopaul @SAMSAL  who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Master Mentor

@Chetankc 
From a NiFi perspective there is not much guidance that can be given with such little information.

  1. What does "10 Billion Load" mean?  Is it the number if unique files being ingested to NiFi?  What is size average?  What is rate of ingest?  
  2. What is "15,000 process"?  Is this the number of NiFi processors added to the NiFi canvas?  What types of processors are being used?  Does your dataflow(s) do a lot of content modification? Have you done testing on throughput performance and done any performance tuning?  15,000 processors is a lot of execution scheduling against your CPU cores. In your load testing what was you CPU load average?  What was your memory impact? 
  3. You also have custom NiFi components.  Are you referring to these custom components as using many threads or the totality of the 15,000 components using a lot of threads? What does a lot of threads mean here? Are any of these long running threads or are they all millisecond thread executions?

What kind of performance and throughput are you achieving now?   and onn what type of setup (how many nodes in your NiFi cluster, number of CPU cores, JVM Heap settings, type of disk, etc) currently? 

Thank you,
Matt