Member since
10-17-2022
10
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
14477 | 10-25-2022 01:16 PM |
11-08-2022
11:55 AM
@D5ha Your issue is a mutual TLS handshake issue and really has nothing specific to do with NiFi itself. There are a lot of resources on the web for creating certificates. There are even free services like Tinycert you can use to generate valid certificate meeting the requirements I shared in my last response. Providing guidance on how to create certificates does not make much sense since it can be done so many ways: - Self-signed - public CA - Corporate/private CA etc. Your current shared TLS exception is telling you that the IP or Hostname (you have BLUE line through it in yoru image) was not found as a Subject Alternative Name (SAN) in the certificate created for the server side of this handshake which in yoru case happens to also be your NiFi instance. The Site-To-Site-Bulletin-Reporting-Task is acting as the client in this Mutual TLS handshake and the NiFi server S2S destination URL is the server side of this TLS handshake. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
10-28-2022
01:06 PM
@D5ha Not all processors write to the content repository nor is content of a FlowFile ever modified in the content after it is created. Once a FlowFile is created in NiFi it exists as is until terminated. A NiFi FlowFile consists of two parts, FlowFile Attributes (metatadata about the FlowFile which includes details about the FlowFIle's content location in the content_repository) and the FlowFile content itself. When a downstream processor modifies the content of a FlowFile, what is really happening is a new content is written to a new content claim in the content_repository, the original content still remains unchanged. From what you shared, you appear to have just one content_repository. Within that single content_repository, NiFi creates a bunch of sub-directories. NiFi does this because of the massive number of content claims a user's dataflow(s) may hold for better indexing and seeking. What is very important to also understand is that a content claim in the content_repository can hold the content for 1 or more FlowFiles. It is not always one content claim per FlowFiles content. It is also very possible to have multiple queued FlowFiles pointing to the exact same content claim and offset (same exact content). This happens when you dataflow clones a FlowFile (for example routing same outbound relationship from a processor multiple times). So you should never manually delete claims from any content repository as you may delete content for multiple FlowFiles. That being said, you can use data provenance to locate the content_repository (container), subdirectory (section), Content Claim filename(Identifier), Content offset byte where content begins in that claim (Offset), and number of bytes from offset to end of content in the claim (Size). Right click on a processor and select "view data provenance" from displayed context menu: This will list all FlowFiles for which provenance still holds index data on that were processed by this processor: Click the Show Lineage icon (looks like 3 connected circles) to the far right of a FlowFile. You can right click on "clone" and "join" events to find/expand any parent flowfiles in the lineage (the event dot created for the processor on which you said show provenance will be colored red in the lineage graph): Each white circle is a different FlowFile. clicking on a white circle will highlight dataflow path for that FlowFile. Right clicking on an event like "create" and selecting "view details" will tell you all about what is known about that FlowFile (this includes a tab about the "content"): Container corresponds to the following property in the nifi.properties file: nifi.content.repository.directory.default= Section corresponds to subdirectory within the above content repository path. Identifier is the content claim filename. Offset is the byte on which content for this FlowFile begins within that identifier. Size is number of bytes of you reach end of content for that FlowFile's content in the Identifier. I also created an article on how to index the Content Identifier. Indexing a field allows you to locate a content claim and the search for it in your data provenance to find all FlowFile(s) that pointed at it. You can then look view the details of all those FlowFile(s) to see full content calim details as above: https://community.cloudera.com/t5/Community-Articles/How-to-determine-which-FlowFiles-are-associated-to-the-same/ta-p/249185 If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
10-28-2022
06:56 AM
yes code is same for small tables it works fine also, here I need to query around (~200GB) data
... View more
10-24-2022
09:13 AM
@D5ha Sometimes it is useful to know more about your environment to include the full NiFi version and java versions. Since it is reporting issues as loading the flow: java.lang.Exception: Unable to load flow due to: java.util.zip.ZipException: invalid stored block lengths
at org.apache.nifi.web.server.JettyServer.start I would lean towards some issue/corruption of the flow.xml.gz and/or flow.json.gz on this node. Since all nodes run the same exact copy of these files, I'd copy them from a good node to the node failing to start. Depending on your NiFi version you may not have a flow.json.gz file (This format was introduced in the most recent versions). If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more