Member since
07-19-2018
613
Posts
100
Kudos Received
117
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3144 | 01-11-2021 05:54 AM | |
2244 | 01-11-2021 05:52 AM | |
5999 | 01-08-2021 05:23 AM | |
5573 | 01-04-2021 04:08 AM | |
25782 | 12-18-2020 05:42 AM |
09-02-2020
11:24 PM
Hi Everyone, sorry about the confusion. It was late and I was actually looking at the wrong flow file output: i.e. looking at the top one on the list (oldest) instead of the bottom one on the list (newest). @stevenmatison thank you for your reply and effort in making a template.
... View more
09-01-2020
09:16 AM
@stevenmatison Thanks for your answer. As my tables are relatively small and only used to duplicate existing data - is there any way to remove the existing folders before importing new data? regards
... View more
08-28-2020
08:52 AM
@P_Rat98 The error above is saying there is an issue with the Schema Name in your record reader or writer. When inside the properties for Convert Record, click the --> arrow through to the reader/writer and make sure they are configured correctly. You will need to provide the correct schema name (if it is already an existing attribute) or provide the schema text. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-28-2020
08:47 AM
@P_Rat98 You need to set the filename (Object Key) of each parquet file uniquely to save different S3 files. If that processor is configure to just ${filename} then it will over write additional executions. For the second option, if you have split in your data flow, the split parts should have key/value pair for the split and total splits. Inspect your queue and list attributes on split flowfiles for these attributes. You use these attributes with MergeContent to remerge everything back together into a single flowfile. You need to do this before converting to parquet, not after. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-27-2020
08:22 AM
1 Kudo
Oh, that's great. Thanks for your response. That clarifies my question.
... View more
08-27-2020
07:04 AM
I'd recommend these customers work with their account team to plan their CDP Journey. I've dug into a number of customers facing this and found strategies for migrating/upgrading them to either public cloud, on-prem or the recent released private cloud offering.
... View more
08-27-2020
06:24 AM
@derisrayan Your question is impossible to answer without very detailed inspection of the following items: NiFi Cluster Size (# of nodes) and Spec of each Node (CPU/RAM/Disk) The size of the data processing per flowfile The number of pieces of the data arriving per execution of the flow After the above, the configuration of the data flow for concurrency and parallelism is tuned to what NiFi Cluster performance capabilities. This comes down to Total NiFi Nodes, Total Cores, the configuration and how many active threads the NiFi Cluster can handle. With a nicely configured NiFi cluster (3+ nodes) with as much ram and cores as possible, the transactions will be quite impressive. Scaling to 5-10-15+ nodes will increase this to an impressive production ready scale. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-27-2020
06:08 AM
1 Kudo
@P_Rat98 Creating an API with NiFi using HandleHttpRequest and HandleHttpResponse is something I have done quite a few times for Hortonworks and NiFI Customers. This is a great use case for NiFi and sending receiving JSON to NiFi, processing JSON, and completing actions downstream is super easy. I have created a basic template for you which includes the HandleHttpRequest (inbound port 80 call) a process group for doing something with the JSON, and HandleHttpResponse (provides 200 response code) to respond to inbound call. This is an API in the simplest form with NiFi. Depending on your use case you can build out Process Api Request Process Group to suit your needs. Out of the box you should be able to send to import template, add/start the StandHttpContextMap Controller Service, Start the flow, send a call to http://yourhost:80 and have JSON sitting in the bottom of the flow Success Queue. You can find the template here: https://github.com/steven-matison/NiFi-Templates/blob/master/NiFi_API_with_HandleHttpRequest_Demo.xml Some API suggestions: Be sure to take a look at both HandleHttp Processors for the properties you can configure. Ports, hostname, acceptable methods, ssl, authentication, and more. If your API call does not care if the Process API Request finishes, you can put HandleHttpResponse right after HandleHttpRequest, and let all the downstream work happen after the request/response is completed. This is common when I expect my API to be only giving inbound data, and doesn't care what the response is (other than just 200 to know it was received). In this case I accept the payload, return 200, and rest of the flow is decoupled from the connection. If my processing time is lengthy I usually do this so the system initiating the api call is not left waiting. Once you have the basic framework built, consider handling errors, and or returning different status codes as a variable (created before the response) in the Status Code for Handle Http Response. Sometimes I even have different HandleHttpResponse at end of different flow branches. For example: if someone sends invalid JSON, I return maybe 302 or 404 with the invalid error as the content body. Have fun with it. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-24-2020
09:28 AM
@Koffi If you have a nifi flow created and tuned at a very large spec, and you downgrade that spec, you are going to have all kinds of problems like you are experiencing. You are going to need to go into the flow and reduce concurrency and min/max thread pool settings and completely re-tune the flow for the new environment since you reduce the ram and per core of the nodes. Another suggestion is that nifi 1.7 is very dated. You should consider an upgrade to nifi 1.12 and use at least 3 nodes. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-24-2020
05:55 AM
@K_K The error I notice in here is: Caused by: java.net.BindException: Address already in use Some Suggestions: You need to check your DNS, networking (/etc/hosts) and make sure that is all correct. If make adjustments make sure to restart networking or reboot nodes then restart ambari server & agent. You need to make sure ambari-server and ambari-agent do not already have something running on the port which says "address already in use". Make sure that the host(s) for yarn are the correct hosts. I see "0.0.0.0 port 53" in the error log. You will want to make sure yarn is using the right ip/address/host etc and not some form of localhost or 0.0.0.0.0 I hope some of these help you arrive at the solution. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more