Support Questions

Find answers, ask questions, and share your expertise

Assistance Needed: Log Routing and Processing with NiFi

avatar
Explorer

Hello,

I need your guidance on the following scenario:

We have a SIEM (QRadar) infrastructure where Event Collectors receive logs from various data sources. These logs are correlated based on SIEM rules and use cases.

we plan to send logs to the SIEM while also storing a copy in a Data Lake.

Current Approach:

We have structured the workflow as follows:
DATA SOURCES → NiFi → Store in DATA LAKE & Forward to SIEM (using PUTTCP processor)

Questions:

  1. Does NiFi allow segregation of data sources?
  2. Handling LEEF logs:
    • logs reach NiFi with the original source IP but leave NiFi with NiFi’s source IP.
    • Since QRadar first looks for the hostname in the payload (and if absent, uses the source IP), this could cause misidentification.
    • Can NiFi be configured to retain the original source IP while forwarding logs, without modifying the original log (to comply with legal requirements)?
  3. Log Integrity & Authenticity:
    • Does NiFi ensure log integrity and authenticity for legal and compliance purposes?
  4. LEEF Parsing:
    • Is there a NiFi processor available to parse LEEF logs before storing them in HDFS?

thanks in advance

 

1 ACCEPTED SOLUTION

avatar
Master Mentor

@MarinaM 

Welcome to the Cloudera Community.

Your Questions:

  1. Does NiFi allow segregation of data sources?
    - Would need a bit more information from you on this requirement/question.  You can certainly create non connected dataflows on the NiFi canvas for handling of NiFi FlowFiles from various sources to keep them separate.  However on the NiFi backend their is no segregation of content.  NiFi stores the content of 1 too many FlowFiles in content claims.  Anytime new content is created, it is written to the currently still open content claim no matter where in any of the dataflows that content is created. Content written to a content claim is immutable (can't be modified once written), so anywhere in yoru dataflow where you may make modification to a FlowFile's content would result in the new modified version of the content being written to a new content claim.

  2. Handling LEEF logs:
    • logs reach NiFi with the original source IP but leave NiFi with NiFi’s source IP.
      - Would need details on how you have these logs arriving at NiFi and being ingested.
    • Since QRadar first looks for the hostname in the payload (and if absent, uses the source IP), this could cause misidentification.
      - I am not familiar with QRadar, but perhaps you can modify the content when the hostname is missing in the payload via your NiFi dataflow(s)?
    • Can NiFi be configured to retain the original source IP while forwarding logs, without modifying the original log (to comply with legal requirements)?
      - NiFi is a data agnostic tool and does not differentiate logs form any other content it is processing.  Content in NiFi is just bytes of data and it becomes the requirement of any individual processor that may need to interact with the content of the FlowFiles to understand the content format.  Would need to understand how you are ingesting these logs into NiFi. Some processor may be creating FlowFile attributes containing the source IP information which perhaps you can use later in your dataflow?  Perhaps another option is to build yoru dataflow to do lookups on the source IP and modify the syslog header when the hostname is missing?  
  3. Log Integrity & Authenticity:
    • Does NiFi ensure log integrity and authenticity for legal and compliance purposes?
      - As mentioned for NiFi is data agnostic and content claims are immutable. Once a log is ingested the dataflow(s) you build can modify content if designed to do so and that modified log content is written to a new content claim.  Some Processors that modify content may create an entirely new FlowFile with that content referenced in it, but other may just modify the existing FlowFile to point and new modified content in the new content claim while keeping the original FlowFile identifier.  Typically this is the case with processor that have an "Original" relationship where the unmodified original FlowFile routes to this relationship while the modified content is assigned to an entirely new FlowFile which becomes a child FlowFile or that original. 
  4. LEEF Parsing:
    • Is there a NiFi processor available to parse LEEF logs before storing them in HDFS?
      - Based on this IBM doc on LEEF (https://www.ibm.com/docs/en/SS42VS_DSM/pdf/b_Leef_format_guide.pdf), the LEEF logs consists of a RFC 5424 or RFC3164 formatted syslog headers which can be parsed by NiFi Syslog processors. 
      ListenSyslog
      ParseSyslog
      ParseSyslog5424
      PutSyslog- Perhaps using PutSyslog instead of PutTCP can solve your Source IP issue you encounter by using PutTCP.
      There are also controller services that support these syslog formats:
      SyslogReader
      Syslog5424Reader

      Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

      Thank you,
      Matt

View solution in original post

2 REPLIES 2

avatar
Community Manager

@MarinaM, Welcome to our community! To help you get the best possible answer, I have tagged our NiFi experts @MattWho @satz  who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Master Mentor

@MarinaM 

Welcome to the Cloudera Community.

Your Questions:

  1. Does NiFi allow segregation of data sources?
    - Would need a bit more information from you on this requirement/question.  You can certainly create non connected dataflows on the NiFi canvas for handling of NiFi FlowFiles from various sources to keep them separate.  However on the NiFi backend their is no segregation of content.  NiFi stores the content of 1 too many FlowFiles in content claims.  Anytime new content is created, it is written to the currently still open content claim no matter where in any of the dataflows that content is created. Content written to a content claim is immutable (can't be modified once written), so anywhere in yoru dataflow where you may make modification to a FlowFile's content would result in the new modified version of the content being written to a new content claim.

  2. Handling LEEF logs:
    • logs reach NiFi with the original source IP but leave NiFi with NiFi’s source IP.
      - Would need details on how you have these logs arriving at NiFi and being ingested.
    • Since QRadar first looks for the hostname in the payload (and if absent, uses the source IP), this could cause misidentification.
      - I am not familiar with QRadar, but perhaps you can modify the content when the hostname is missing in the payload via your NiFi dataflow(s)?
    • Can NiFi be configured to retain the original source IP while forwarding logs, without modifying the original log (to comply with legal requirements)?
      - NiFi is a data agnostic tool and does not differentiate logs form any other content it is processing.  Content in NiFi is just bytes of data and it becomes the requirement of any individual processor that may need to interact with the content of the FlowFiles to understand the content format.  Would need to understand how you are ingesting these logs into NiFi. Some processor may be creating FlowFile attributes containing the source IP information which perhaps you can use later in your dataflow?  Perhaps another option is to build yoru dataflow to do lookups on the source IP and modify the syslog header when the hostname is missing?  
  3. Log Integrity & Authenticity:
    • Does NiFi ensure log integrity and authenticity for legal and compliance purposes?
      - As mentioned for NiFi is data agnostic and content claims are immutable. Once a log is ingested the dataflow(s) you build can modify content if designed to do so and that modified log content is written to a new content claim.  Some Processors that modify content may create an entirely new FlowFile with that content referenced in it, but other may just modify the existing FlowFile to point and new modified content in the new content claim while keeping the original FlowFile identifier.  Typically this is the case with processor that have an "Original" relationship where the unmodified original FlowFile routes to this relationship while the modified content is assigned to an entirely new FlowFile which becomes a child FlowFile or that original. 
  4. LEEF Parsing:
    • Is there a NiFi processor available to parse LEEF logs before storing them in HDFS?
      - Based on this IBM doc on LEEF (https://www.ibm.com/docs/en/SS42VS_DSM/pdf/b_Leef_format_guide.pdf), the LEEF logs consists of a RFC 5424 or RFC3164 formatted syslog headers which can be parsed by NiFi Syslog processors. 
      ListenSyslog
      ParseSyslog
      ParseSyslog5424
      PutSyslog- Perhaps using PutSyslog instead of PutTCP can solve your Source IP issue you encounter by using PutTCP.
      There are also controller services that support these syslog formats:
      SyslogReader
      Syslog5424Reader

      Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

      Thank you,
      Matt