Member since
07-19-2018
613
Posts
101
Kudos Received
117
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 5687 | 01-11-2021 05:54 AM | |
| 3812 | 01-11-2021 05:52 AM | |
| 9487 | 01-08-2021 05:23 AM | |
| 9288 | 01-04-2021 04:08 AM | |
| 38605 | 12-18-2020 05:42 AM |
08-24-2020
09:28 AM
@Koffi If you have a nifi flow created and tuned at a very large spec, and you downgrade that spec, you are going to have all kinds of problems like you are experiencing. You are going to need to go into the flow and reduce concurrency and min/max thread pool settings and completely re-tune the flow for the new environment since you reduce the ram and per core of the nodes. Another suggestion is that nifi 1.7 is very dated. You should consider an upgrade to nifi 1.12 and use at least 3 nodes. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-17-2020
04:34 AM
@vikrant_kumar24 I believe the solution you are looking for is to use ExtractText to check for string matching the country you want in the first row. This uses regex to match the entire file which you only need 1 match to know what country it is. Using ExtractText to get an attribute called "country" you would when use RouteOnAttribute to create different country routes. For example: usa => ${country.equals("usa"). Once your routes are defined you can pull them off RouteOnAttribute and send them down separate flows you create for each country. You also should know that you can achieve the same logic of checking/defining/routing country by using QueryRecord. Either method is suitable, but the latter method is more standard in the newest versions of nifi. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-15-2020
04:46 AM
@Jarinek Some infofmation about your nifi configuration will help be more accurate. For example: min/max ram, number of cores, disk configuration, etc.. Information about your flow is important too. The processor/queue back pressure, concurrency, and run time all effect the performance you need. Without this information, it sounds like your testing solution is exceeding the inbound capabilities of the flow tuning (nifi config, processor/queue config). You should look to increase concurrency, increase queue size and back pressure based on # of flowfiles moving through. your data flow. You should also inspect the min/max thread counts as these have a major impact on performance. All of these items will be seriously limited with a single node, so be mindful of your expectations. If you can I would recommend a small 3 node nifi cluster to evaluate nifi performance in a better test environment where you can really turn up the performance and distribute the work load across 3 nodes. With 3 times as many cores & ram you can make better use of min/max thread count, increase concurrency much higher, and you should see the stability you are expecting. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-15-2020
04:35 AM
@avi166 I think by the time you get to RouteOnAttribute you should have already read the file, but there isn't really a right or wrong answer. One of the things I like the most about nifi is that there are many different ways to achieve the same end result. To answer your next question, you may be able to use the same flow for different CSV file structures, and you should if you can but dont be afraid to split the flow again as I have outline above. These may need different schemas, different record reader, but same record writer. Rejoin again when files are ready to converge into the same processor or branch of functionality. I also tried to point out that at first you may, for example, have to route some CSVs to a different file structure branch. Then by finishing all csv branches, and knowing the differences for each, you should be able to make a final more dynamic branch to replace 1 or more previous branches. This is tuning and optimization steps that you really won't know until you evaluate that final flow branch against the previous versions.
... View more
08-14-2020
06:15 AM
@avi166 This is a common use case for nifi to create a data flow that is a single entry point for data files of different expected types up to an including all types. For example you an create an API with HandleHttpRequest/HandleHttpResponse to accept a post of a file. Another example is using getFile/ListFile/etc at the top of a flow to read a directory. Another new common example would be to get the files from Amazon S3. After the top of the flow where files arrive inbound, it is common to create a single flow with a single branch for a specific use case. This is how you have created it for CSV. To improve your flow you would add RouteOnAttribute to check the file name ends in CSV. This would create a "csv" route which you would then direct downstream the flow you created. Next you similarly split the flow for other types TXT, AVRO, etc, and then one for unmatched type. Once the split is made for each you can now create separate branches (data flows) add additional processors that needed to prepare each type for insertion. Sometimes you can create a branch that can handle multiple types too. Some split branches may take 3-5+ processors to prepare for DB2 while others maybe even just 1 or 2 to prepare the data. When all the different data flow branches are ready, you then route them all back to a single processor or processor group to handle insert into DB2. So you have a flow that is a single entry, that splits into many branches, and then rejoins at the bottom. While working and operating in this manner you may make separate flow branches and realize later you could combine them by making a new branch that is a lil more dynamic. You should always be looking to improve your flows over time in this manner. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-14-2020
05:55 AM
1 Kudo
Please accept the solution as answer. Doin this helps complete the solution.
... View more
08-13-2020
08:11 AM
@JonnyL I would highly recommend that you back up and create a small 3 node nifi cluster to test this feature. Putting 2 nifi on single node, does not satisfy the test cases you really want to be experimenting with. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-13-2020
05:11 AM
@ManuN Anyway you go about this task, you are going to have to execute the commands against the tables to get sizes. With a large number of tables this should be a script, program, or process. The common methods are to query the table with hive: -- gives all properties
show tblproperties yourTableName
-- show just the raw data size
show tblproperties yourTableName("rawDataSize") Or the most accurate is to look at the table location in HDFS: hdfs dfs -du -s -h /path/to/table There are also some methods to try and get this data directly from the Hive Metastore, assuming the table is an internal Hive table. In the past I have completed this with a basic bash/shell script. I have also done similar in NiFI and prefer to do it like this without coding. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-13-2020
05:04 AM
1 Kudo
@Seetha This is a very common use case for NiFi and JSON processing pipelines. Here is a link that explains a solution (ExecuteScript) you could use: https://community.cloudera.com/t5/Support-Questions/Apache-Nifi-How-to-calculate-SUM-or-AVERAGE-of-values-in-a/td-p/164131 Additional @mburgess in that posts links a JIRA for a new Processor he was trying to work on at the time. The end result of that JIRA is his recommendation that QueryRecord processor should give you the ability to calculate the sum. Using QueryRecord you would read the values and be able to create a fabricated sql query to calculate the sums. Then you would use a RecordWriter to re-write the orginal json object with the sums, or to create completely different json object with the sums. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-13-2020
04:53 AM
@ang_coder Depending on the number of unique values you need to add, updateAttribute + expression language will allow you to create flowfile attribute based on the table results in a manner I would call "manually". These can be used in routing or further manipulating the content (original database rows) according to your match logic. For example with ReplaceText you can replace the original value with the original value + the new value. Additionally during your flow you can programmatically change the results of the content of the flowfile to add the new column using the attribute from above, or with a fabricated query. In the latter example you would use a RecordReader/RecordWriter/UpdateRecord on your data. In a nutshell you create a translation on the content that includes adding the new field. This is a common use case for nifi and there are many different ways to achieve it. To have a more complete reply that better matches your use case, you should provide more information, sample input data, the expected output data, your flow, a template of your flow, and maybe what you have tried already. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more