About stevenmatison

stevenmatison · ‎08-27-2020

@Jayavardhini Yes, It is a best practice to auto-terminate successfully completed data flow flowfiles at the bottom of your flow branches. You do not want them to remain as they will continue to hold resources. A good NiFi developer will build a flow during development where all bottom branches are visible, including routing all processor relationships, even ones that will eventually be auto terminated. This gives visibility during testing and flow creation if someone unexpected happens. I use stopped output ports for this purpose. In some of my production flows I create capture points for exceptions, these are bottom branch Process Groups or Queues where the flowfiles will remain until someone inspects them, makes a change, inspects the provenance, and maybe even reroutes it back into the flow again. This is the only case where I keep flowfiles in my flow. In all other cases I auto terminate and the flowfiles are gone from my flows. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-27-2020

@derisrayan Your question is impossible to answer without very detailed inspection of the following items: NiFi Cluster Size (# of nodes) and Spec of each Node (CPU/RAM/Disk) The size of the data processing per flowfile The number of pieces of the data arriving per execution of the flow After the above, the configuration of the data flow for concurrency and parallelism is tuned to what NiFi Cluster performance capabilities. This comes down to Total NiFi Nodes, Total Cores, the configuration and how many active threads the NiFi Cluster can handle. With a nicely configured NiFi cluster (3+ nodes) with as much ram and cores as possible, the transactions will be quite impressive. Scaling to 5-10-15+ nodes will increase this to an impressive production ready scale. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-27-2020

@Muffex My recommendation for ingestion process is to always use staging/temporary tables which are managed separately from the master table the data needs to arrive in. This allows you operate on the staging tables before or after those results are added to the master table w/o effecting the master table. In your use case, your ingestion process would sqoop to temp, insert from temp to master table, then drop temp location. In some of my past implementations of this manner, the temp tables were organized hourly, and they stay active for at least 7 days before a decoupled cleanup job removes anything 7 days old. This idea was done for auditing purposes, but normally I would create and destroy the data during the ingestion procedure. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-27-2020

@P_Rat98 Creating an API with NiFi using HandleHttpRequest and HandleHttpResponse is something I have done quite a few times for Hortonworks and NiFI Customers. This is a great use case for NiFi and sending receiving JSON to NiFi, processing JSON, and completing actions downstream is super easy. I have created a basic template for you which includes the HandleHttpRequest (inbound port 80 call) a process group for doing something with the JSON, and HandleHttpResponse (provides 200 response code) to respond to inbound call. This is an API in the simplest form with NiFi. Depending on your use case you can build out Process Api Request Process Group to suit your needs. Out of the box you should be able to send to import template, add/start the StandHttpContextMap Controller Service, Start the flow, send a call to http://yourhost:80 and have JSON sitting in the bottom of the flow Success Queue. You can find the template here: https://github.com/steven-matison/NiFi-Templates/blob/master/NiFi_API_with_HandleHttpRequest_Demo.xml Some API suggestions: Be sure to take a look at both HandleHttp Processors for the properties you can configure. Ports, hostname, acceptable methods, ssl, authentication, and more. If your API call does not care if the Process API Request finishes, you can put HandleHttpResponse right after HandleHttpRequest, and let all the downstream work happen after the request/response is completed. This is common when I expect my API to be only giving inbound data, and doesn't care what the response is (other than just 200 to know it was received). In this case I accept the payload, return 200, and rest of the flow is decoupled from the connection. If my processing time is lengthy I usually do this so the system initiating the api call is not left waiting. Once you have the basic framework built, consider handling errors, and or returning different status codes as a variable (created before the response) in the Status Code for Handle Http Response. Sometimes I even have different HandleHttpResponse at end of different flow branches. For example: if someone sends invalid JSON, I return maybe 302 or 404 with the invalid error as the content body. Have fun with it. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-24-2020

@Koffi If you have a nifi flow created and tuned at a very large spec, and you downgrade that spec, you are going to have all kinds of problems like you are experiencing. You are going to need to go into the flow and reduce concurrency and min/max thread pool settings and completely re-tune the flow for the new environment since you reduce the ram and per core of the nodes. Another suggestion is that nifi 1.7 is very dated. You should consider an upgrade to nifi 1.12 and use at least 3 nodes. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-24-2020

@K_K The error I notice in here is: Caused by: java.net.BindException: Address already in use Some Suggestions: You need to check your DNS, networking (/etc/hosts) and make sure that is all correct. If make adjustments make sure to restart networking or reboot nodes then restart ambari server & agent. You need to make sure ambari-server and ambari-agent do not already have something running on the port which says "address already in use". Make sure that the host(s) for yarn are the correct hosts. I see "0.0.0.0 port 53" in the error log. You will want to make sure yarn is using the right ip/address/host etc and not some form of localhost or 0.0.0.0.0 I hope some of these help you arrive at the solution. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-18-2020

@saivenkatg55 Based on the error I seen the issue is with Zookeeper not NiFI: zookeeper CONNECTIONLOSS You can find more info in this post: https://community.cloudera.com/t5/Support-Questions/NiFi-Clustering-Issue-ConnectionLoss-Error/td-p/215778 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-17-2020

@sppandita85BLR You should update your post with the issues you are having. You should be looking to add exact error messages found in ambari/agent logs. That process should be pretty easy: https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/configuring-wire-encryption/content/set_up_two-way_ssl_between_ambari_server_and_ambari_agents.html To confirm this works, using an ambari cluster I have ready for testing, I edited the ambari-server properties files, restarted ambari-server, and restarted ambari-agent. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-17-2020

@vikrant_kumar24 I believe the solution you are looking for is to use ExtractText to check for string matching the country you want in the first row. This uses regex to match the entire file which you only need 1 match to know what country it is. Using ExtractText to get an attribute called "country" you would when use RouteOnAttribute to create different country routes. For example: usa => ${country.equals("usa"). Once your routes are defined you can pull them off RouteOnAttribute and send them down separate flows you create for each country. You also should know that you can achieve the same logic of checking/defining/routing country by using QueryRecord. Either method is suitable, but the latter method is more standard in the newest versions of nifi. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-15-2020

@Jarinek Some infofmation about your nifi configuration will help be more accurate. For example: min/max ram, number of cores, disk configuration, etc.. Information about your flow is important too. The processor/queue back pressure, concurrency, and run time all effect the performance you need. Without this information, it sounds like your testing solution is exceeding the inbound capabilities of the flow tuning (nifi config, processor/queue config). You should look to increase concurrency, increase queue size and back pressure based on # of flowfiles moving through. your data flow. You should also inspect the min/max thread counts as these have a major impact on performance. All of these items will be seriously limited with a single node, so be mindful of your expectations. If you can I would recommend a small 3 node nifi cluster to evaluate nifi performance in a better test environment where you can really turn up the performance and distribute the work load across 3 nodes. With 3 times as many cores & ram you can make better use of min/max thread count, increase concurrency much higher, and you should see the stability you are expecting. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

Online	Offline
Last Visited	‎06-01-2022 03:47 PM

Name	Steven Matison
Location	Florida
Member Since	‎07-19-2018 04:45 PM
Last Visited	‎06-01-2022 03:47 PM
Posts	613
Kudos received	101

Cloudera Community

Re: Apache nifi - how to convert a file .txt into ...

Re: Apache Nifi - Using PutParquet, the HDFS file ...

Re: How to extract csv column record and used it f...

Re: Could not connect to Distributed Map Cache ser...

Re: NiFi InvokeHTTP POST JSON

Re: Is it the best practice to delete the user def...

Re: How much Transaction per Second Apache NiFi ca...

Re: Sqoop Hive-import not deleting old data in war...

Re: Hosting an API in Nifi

Re: Nifi There is insufficient memory for the java...

Re: DNS registration start in Ambari 2.7.4 is givi...

Re: NIFI UI is not working

Re: configuring the two way ssl for ambari

Re: How to route file based on one column value

Re: NiFi:ListenHTTP - errors when multiple client ...