Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Cloudera Employee

This article aims to show how to planning a Nifi Cluster following the best practices.


1) Hardware Provisioning

38495-captura-de-tela-2017-09-02-as-155600.png



2) Hardware Considerations for HDF

- General Hardware

A key design point of NiFi is to use typical enterprise class application servers.

Hardware failure:

  • Nodes are typically configured with RAID 5/10 to handle hardware failures through replication, redundancy
  • More nodes means less impact from failure.
  • More nodes provide increased throughput.
  • - Machine Class

    A NiFi cluster consists of a single class of machine

    Balanced NiFi Node:

  • 8 CPU cores per node minimum.
  • 6 Hard disks per node minimum, Spinning or SSD base on throughput requirements.
  • 8 GB of RAM per node minimum.
  • Designed for availability.
  • Typical enterprise class application server.
  • Resilience built into the server itself (RAID)
  • Cost reduced where possible to strike proper price/performance ratio owing to volume.

    - Networking
  • Network decisions play a role due to the clustered nature of data processing.

    In-rack backplane/Top-of-rack Switch:

  • Keeps traffic local and reduces load on expensive aggregate switches.
  • Dual NIC Recommended:
  • Depends on NTW requirements.
  • 10G Recommended:
  • Immediate cost vs Future-proofing.
  • Investment in 10G upfront will survive next 2-3 server hardware upgrades.
  • In-rack/top-of-rack switches allow Cat6 copper or Twinax to reduce 10G costs.


  • - NiFi: Hardware Driving Factors

    NIFI is designed to take advantage of:

    • all the cores on a machine
    • all the network capacity
    • all the disk speed
    • many GB of RAM (though usually not all) on a system

    Most important hardware factors :

    • Top-end disk throughput as configured which is a combination of seek time and raw performance
    • Network speed
    • CPU only a concern when there is a lot of compression, encryption, or media analytics
    • Need to ensure flow can take advantage of the contiguous block allocation approach NiFiuses or it will result in lots of random seeks thus increasing seek times and decreasing effective throughput.


    3) HDF Disk Partition Baseline

    38496-captura-de-tela-2017-09-02-as-161133.png



    4) Disk Partitioning – Nifi Nodes (Repositories)

    38497-captura-de-tela-2017-09-02-as-161340.png



    5) NiFi: Default Cluster Recommendation

    When not provided with information to gauge the rate and complexity of data flow, start with a default cluster of three nodes. Three nodes are needed for HA by Zookeeper Quorum process.

    The SKU is priced for cores, but it can be split up. So, a 16 core SKU can be split into 3 machines of 4 cores each. More cores per node will improve throughput (up to an extent).

    So, starting cluster for, say, 50MB/s sustained throughput for average Flow is:

    • 3 nodes each with:
      • CPU: 8+ cores (16 is preferred)
      • Memory: 8+ GB
      • Disk: 6 disks, each 1TB disks (could be spinning or SSD)



    6) NiFi Clusters Scale Linearly

    38498-captura-de-tela-2017-09-02-as-161501.png

    17,415 Views
    Comments
    Contributor

    Have you checked Nifi throughput using Content Repo in a JBOD mode instead of Raid? Basically, let application decide for the distribution of data.

    Don't have an account?
    Coming from Hortonworks? Activate your account here
    Version history
    Revision #:
    2 of 2
    Last update:
    ‎08-17-2019 11:23 AM
    Updated by:
     
    Contributors
    Top Kudoed Authors