About stevenmatison

stevenmatison · ‎08-24-2020

@Koffi If you have a nifi flow created and tuned at a very large spec, and you downgrade that spec, you are going to have all kinds of problems like you are experiencing. You are going to need to go into the flow and reduce concurrency and min/max thread pool settings and completely re-tune the flow for the new environment since you reduce the ram and per core of the nodes. Another suggestion is that nifi 1.7 is very dated. You should consider an upgrade to nifi 1.12 and use at least 3 nodes. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-17-2020

@vikrant_kumar24 I believe the solution you are looking for is to use ExtractText to check for string matching the country you want in the first row. This uses regex to match the entire file which you only need 1 match to know what country it is. Using ExtractText to get an attribute called "country" you would when use RouteOnAttribute to create different country routes. For example: usa => ${country.equals("usa"). Once your routes are defined you can pull them off RouteOnAttribute and send them down separate flows you create for each country. You also should know that you can achieve the same logic of checking/defining/routing country by using QueryRecord. Either method is suitable, but the latter method is more standard in the newest versions of nifi. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-15-2020

@Jarinek Some infofmation about your nifi configuration will help be more accurate. For example: min/max ram, number of cores, disk configuration, etc.. Information about your flow is important too. The processor/queue back pressure, concurrency, and run time all effect the performance you need. Without this information, it sounds like your testing solution is exceeding the inbound capabilities of the flow tuning (nifi config, processor/queue config). You should look to increase concurrency, increase queue size and back pressure based on # of flowfiles moving through. your data flow. You should also inspect the min/max thread counts as these have a major impact on performance. All of these items will be seriously limited with a single node, so be mindful of your expectations. If you can I would recommend a small 3 node nifi cluster to evaluate nifi performance in a better test environment where you can really turn up the performance and distribute the work load across 3 nodes. With 3 times as many cores & ram you can make better use of min/max thread count, increase concurrency much higher, and you should see the stability you are expecting. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-15-2020

@avi166 I think by the time you get to RouteOnAttribute you should have already read the file, but there isn't really a right or wrong answer. One of the things I like the most about nifi is that there are many different ways to achieve the same end result. To answer your next question, you may be able to use the same flow for different CSV file structures, and you should if you can but dont be afraid to split the flow again as I have outline above. These may need different schemas, different record reader, but same record writer. Rejoin again when files are ready to converge into the same processor or branch of functionality. I also tried to point out that at first you may, for example, have to route some CSVs to a different file structure branch. Then by finishing all csv branches, and knowing the differences for each, you should be able to make a final more dynamic branch to replace 1 or more previous branches. This is tuning and optimization steps that you really won't know until you evaluate that final flow branch against the previous versions.

stevenmatison · ‎08-14-2020

@avi166 This is a common use case for nifi to create a data flow that is a single entry point for data files of different expected types up to an including all types. For example you an create an API with HandleHttpRequest/HandleHttpResponse to accept a post of a file. Another example is using getFile/ListFile/etc at the top of a flow to read a directory. Another new common example would be to get the files from Amazon S3. After the top of the flow where files arrive inbound, it is common to create a single flow with a single branch for a specific use case. This is how you have created it for CSV. To improve your flow you would add RouteOnAttribute to check the file name ends in CSV. This would create a "csv" route which you would then direct downstream the flow you created. Next you similarly split the flow for other types TXT, AVRO, etc, and then one for unmatched type. Once the split is made for each you can now create separate branches (data flows) add additional processors that needed to prepare each type for insertion. Sometimes you can create a branch that can handle multiple types too. Some split branches may take 3-5+ processors to prepare for DB2 while others maybe even just 1 or 2 to prepare the data. When all the different data flow branches are ready, you then route them all back to a single processor or processor group to handle insert into DB2. So you have a flow that is a single entry, that splits into many branches, and then rejoins at the bottom. While working and operating in this manner you may make separate flow branches and realize later you could combine them by making a new branch that is a lil more dynamic. You should always be looking to improve your flows over time in this manner. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-14-2020

Please accept the solution as answer. Doin this helps complete the solution.

stevenmatison · ‎08-13-2020

@JonnyL I would highly recommend that you back up and create a small 3 node nifi cluster to test this feature. Putting 2 nifi on single node, does not satisfy the test cases you really want to be experimenting with. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-13-2020

@ManuN Anyway you go about this task, you are going to have to execute the commands against the tables to get sizes. With a large number of tables this should be a script, program, or process. The common methods are to query the table with hive: -- gives all properties show tblproperties yourTableName -- show just the raw data size show tblproperties yourTableName("rawDataSize") Or the most accurate is to look at the table location in HDFS: hdfs dfs -du -s -h /path/to/table There are also some methods to try and get this data directly from the Hive Metastore, assuming the table is an internal Hive table. In the past I have completed this with a basic bash/shell script. I have also done similar in NiFI and prefer to do it like this without coding. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-13-2020

@Seetha This is a very common use case for NiFi and JSON processing pipelines. Here is a link that explains a solution (ExecuteScript) you could use: https://community.cloudera.com/t5/Support-Questions/Apache-Nifi-How-to-calculate-SUM-or-AVERAGE-of-values-in-a/td-p/164131 Additional @mburgess in that posts links a JIRA for a new Processor he was trying to work on at the time. The end result of that JIRA is his recommendation that QueryRecord processor should give you the ability to calculate the sum. Using QueryRecord you would read the values and be able to create a fabricated sql query to calculate the sums. Then you would use a RecordWriter to re-write the orginal json object with the sums, or to create completely different json object with the sums. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-13-2020

@ang_coder Depending on the number of unique values you need to add, updateAttribute + expression language will allow you to create flowfile attribute based on the table results in a manner I would call "manually". These can be used in routing or further manipulating the content (original database rows) according to your match logic. For example with ReplaceText you can replace the original value with the original value + the new value. Additionally during your flow you can programmatically change the results of the content of the flowfile to add the new column using the attribute from above, or with a fabricated query. In the latter example you would use a RecordReader/RecordWriter/UpdateRecord on your data. In a nutshell you create a translation on the content that includes adding the new field. This is a common use case for nifi and there are many different ways to achieve it. To have a more complete reply that better matches your use case, you should provide more information, sample input data, the expected output data, your flow, a template of your flow, and maybe what you have tried already. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

Online	Offline
Last Visited	‎03-11-2026 03:29 PM

Name	Steven Matison
Location	Florida
Member Since	‎07-19-2018 04:45 PM
Last Visited	‎03-11-2026 03:29 PM
Posts	613
Kudos received	101

Cloudera Community

Re: Apache nifi - how to convert a file .txt into ...

Re: Apache Nifi - Using PutParquet, the HDFS file ...

Re: How to extract csv column record and used it f...

Re: Could not connect to Distributed Map Cache ser...

Re: NiFi InvokeHTTP POST JSON

Re: Nifi There is insufficient memory for the java...

Re: How to route file based on one column value

Re: NiFi:ListenHTTP - errors when multiple client ...

Re: NIFI read different file format and convert it...

Re: NIFI read different file format and convert it...

Re: Writing parquet file to hdfs for internal Hive...

Re: NiFi Load Balancing Demo

Re: Knowing size of Hive tables

Re: Apache Nifi to do aggregation for the given tr...

Re: Is there any processor in NiFi which helps me ...