Member since
02-01-2022
274
Posts
97
Kudos Received
60
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
402 | 05-15-2025 05:45 AM | |
3396 | 06-12-2024 06:43 AM | |
5926 | 04-12-2024 06:05 AM | |
4065 | 12-07-2023 04:50 AM | |
2184 | 12-05-2023 06:22 AM |
10-31-2022
09:33 AM
@steven-matison I have been trying to get ifElse working for me but the below gives me an empty string > "" And this gives me null as a string > "null" is there a way to return null not as a string?
... View more
10-28-2022
01:06 PM
@D5ha Not all processors write to the content repository nor is content of a FlowFile ever modified in the content after it is created. Once a FlowFile is created in NiFi it exists as is until terminated. A NiFi FlowFile consists of two parts, FlowFile Attributes (metatadata about the FlowFile which includes details about the FlowFIle's content location in the content_repository) and the FlowFile content itself. When a downstream processor modifies the content of a FlowFile, what is really happening is a new content is written to a new content claim in the content_repository, the original content still remains unchanged. From what you shared, you appear to have just one content_repository. Within that single content_repository, NiFi creates a bunch of sub-directories. NiFi does this because of the massive number of content claims a user's dataflow(s) may hold for better indexing and seeking. What is very important to also understand is that a content claim in the content_repository can hold the content for 1 or more FlowFiles. It is not always one content claim per FlowFiles content. It is also very possible to have multiple queued FlowFiles pointing to the exact same content claim and offset (same exact content). This happens when you dataflow clones a FlowFile (for example routing same outbound relationship from a processor multiple times). So you should never manually delete claims from any content repository as you may delete content for multiple FlowFiles. That being said, you can use data provenance to locate the content_repository (container), subdirectory (section), Content Claim filename(Identifier), Content offset byte where content begins in that claim (Offset), and number of bytes from offset to end of content in the claim (Size). Right click on a processor and select "view data provenance" from displayed context menu: This will list all FlowFiles for which provenance still holds index data on that were processed by this processor: Click the Show Lineage icon (looks like 3 connected circles) to the far right of a FlowFile. You can right click on "clone" and "join" events to find/expand any parent flowfiles in the lineage (the event dot created for the processor on which you said show provenance will be colored red in the lineage graph): Each white circle is a different FlowFile. clicking on a white circle will highlight dataflow path for that FlowFile. Right clicking on an event like "create" and selecting "view details" will tell you all about what is known about that FlowFile (this includes a tab about the "content"): Container corresponds to the following property in the nifi.properties file: nifi.content.repository.directory.default= Section corresponds to subdirectory within the above content repository path. Identifier is the content claim filename. Offset is the byte on which content for this FlowFile begins within that identifier. Size is number of bytes of you reach end of content for that FlowFile's content in the Identifier. I also created an article on how to index the Content Identifier. Indexing a field allows you to locate a content claim and the search for it in your data provenance to find all FlowFile(s) that pointed at it. You can then look view the details of all those FlowFile(s) to see full content calim details as above: https://community.cloudera.com/t5/Community-Articles/How-to-determine-which-FlowFiles-are-associated-to-the-same/ta-p/249185 If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
10-28-2022
06:56 AM
yes code is same for small tables it works fine also, here I need to query around (~200GB) data
... View more
10-28-2022
05:44 AM
1 Kudo
@sathish3389 Define a parameter context and parameter ("parameter_password") for your flow with your password string, define that as sensitive value, then use the parameter in the processor property value : ${http.headers.Authorization:equals(#{parameter_password}) This will hide the password and make it easy to update the password by just updating the parameter.
... View more
10-28-2022
05:24 AM
@Ekodar You will need to use a driver to connect php to impala. Quick search and this looks promising: https://docs.cloudera.com/documentation/other/connectors/impala-jdbc/latest/Cloudera-JDBC-Driver-for-Impala-Install-Guide.pdf Here is another example with more details showing actual php code: https://www.cdata.com/kb/tech/impala-odbc-php.rst
... View more
10-25-2022
12:53 PM
@ryu CDP Public Cloud Azure or CDP Private Cloud on Azure VMs? To link a NiFi outside of the cluster, you will need to provide that nifi with the files from the CDP Cluster. For example core-site.xml, hdfs-site.xml. Outside of that configuration, you will need to do some networking to allow access between systems, and then last but not least deal with access/auth and kerberos. If you are already working on some of these areas, be sure to include screen shots of processors, controller services, configs, etc.
... View more
10-24-2022
06:27 AM
@yoiun Going to go out on a ledge here: It seems like the the sqoop command and hue/sqoop command are executed on different hosts. Does the new host have permissions to mysql? This error here: Access denied for user 'demo'@'152.30.119.754' leads me to believe it does not.
... View more
10-24-2022
06:18 AM
@i_am_dba This is a very difficult one to explain. I think the issue is the string schema, or removing the avro schema you mentioned. My first suggestion would be to try to specific the schema which should help getting the data into the right formats. An alternate solution is to try and do that manually by replacetext/regex,etc but that is not the ideal solution. That said, another higher level suggesting is to update the upstream datasource to permanently solve the instability from '' (blank string), 'null' (string), or actual NULL (not a blank, '', or string at all).
... View more
10-24-2022
06:09 AM
@MaarufB Please make a new post with as much detail as you can around your question and use case. This is an old topic and will not get a good response in the comments. Feel free to @ tag me in the new post.
... View more
10-13-2022
08:05 AM
@sathish3389 Its not entirely clear what you are asking here but I will give it a go. ListenHttp is used to listen to an http port with POST limited capabilities. If you are looking post data to nifi as more of REST API, you may want to check out HandleHttpRequest and HandleHttpResponse, as they have a bit more capability, and some ssl client authentication requirements. They also allow you to program authentication logic before returning the response. To do that. you would build your data flow (after HandleHttpRequest) to look for an authentication (user,password,key,etc) header, validate that and then if valid, continue to HandleHttpResponse with 200 (success). An invalidate authentication header would then go to HandleHttpResponse with 500 (error). An invalid request (wrong path, missing info, etc) could be routed to HandleHttpReponse with 404 (invalid).
... View more