Member since
11-16-2015
902
Posts
664
Kudos Received
249
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
127 | 09-30-2025 05:23 AM | |
597 | 06-26-2025 01:21 PM | |
441 | 06-19-2025 02:48 PM | |
681 | 05-30-2025 01:53 PM | |
9668 | 02-22-2024 12:38 PM |
11-02-2016
04:45 PM
2 Kudos
The Hive processors share some code with the Hadoop processors (in terms of Kerberos, etc.), they expect "hadoop.security.authentication" to be set to "kerberos" in your config file(s) (core-site, hive-site, e.g.)
... View more
11-02-2016
04:39 PM
2 Kudos
QueryDatabaseTable does indeed treat DECIMAL and NUMERIC types as strings in the outgoing Avro, there is a Jira (NIFI-2624) to improve the handling of these types. In the meantime, you might be able to use ConvertAvroSchema, but you won't be able to support BigDecimal values there either; it only supports conversion to/from int, long, double, and float. If your values fit in a double, that might work for now.
... View more
10-28-2016
05:00 PM
2 Kudos
Do they need to be separate fetches? If you use a single ExecuteSQL with JOINs for the foreign keys, you can get a single result set (in Avro), then use ConvertAvroToJSON to convert to a single JSON object. If they must be in different flow files, there is currently no "MergeJSON" processor, although that would be a great contribution if you're interested in writing a full processor. An alternative is to use ExecuteScript or InvokeScriptedProcessor. In either case, keep in mind that NiFi employs a flow-based paradigm, so merging arbitrary incoming flow files can be tricky. This is done in some splitting processors (such as SplitText) by setting "fragment.id", "fragment.count", and "fragment.index" attributes on the flow files, so a downstream "merging" processor can handle these micro-batches by merging together all files with the same fragment.id. I've got an example of this kind of merging processor as a pull request for NIFI-2735. If you're using a scripting processor and just want to solve this one specific issue, you could assume that you will only get 3 incoming flow files and merge them accordingly. This is fragile but could work for your use case.
... View more
10-27-2016
01:07 AM
1 Kudo
You've got a single SplitText in your example, you might find better performance out of multiple SplitTexts to reduce the size incrementally, maybe 100,000 lines is too many, perhaps split into 10,000 then 100 (or 1, etc.) If you have a single file in FetchFile, you won't see performance improvements with multiple tasks unless you break down the huge file as described. Otherwise, with such a large single input file, you might see a bottleneck at some point due to the vast amount of data moving through the pipeline, and multi-threading will not help with a single SplitText if you have a single input. With multiple SplitTexts (and the "downstream" ones having multiple threads/tasks), you may find some improvement in throughput.
... View more
10-26-2016
06:11 PM
Multiple SplitTexts just to get the size of each flow file down to a manageable number of lines (not 1 as I suggested above, but not the whole file either), then RouteText with the Grouping Regular Expression the way you have it, then multiple dynamic properties (similar to your TagName above), each with a value of what you want to match: Tag1 with value ABC04.PI_B04_EX01_A_STPDATTRG.F_CV Tag2 with value ABC05.X4_WET_MX_DDR.F_CV ...etc. Once you Apply the changes and reopen the dialog you should see relationships like Tag1 and Tag2, you can then route those relationships to the appropriate branch of the flow. In each branch, you may need multiple MergeContents like @mclark describes above, to incrementally build up larger files. At the end of each branch, you should have a flow file full of entries with the same tag name. An alternative is to use SplitTexts down to 1 flow file per line, then ExtractText to put the tag name in an attribute, then RouteOnAttribute to route the files, then the MergeContents to build up a single file with all the lines with the same tag name. This seems slower to me, so I'm hoping the other solution works.
... View more
10-26-2016
05:11 PM
Also you should use a series of SplitText processors in a row rather than one, the first could split into 100,000 rows or something, then the next to 1000, then the next to 1. Those numbers (and the number of SplitTexts) can be tuned for your dataset, but should prevent any single processor from hanging or running out of memory.
... View more
10-26-2016
12:52 PM
As @Bryan Bende has said, it isn't possible with those processors and/or the framework. However, you could emulate this part of the flow with something like ExecuteScript, but you'd be responsible for all the work (reading in the JSON, splitting it, getting the fields out into attributes). Groovy for example has a JsonSlurper which reads in the JSON to an object, at that point you could access the array (using object notation not JSON path), call each(), then further access the members (again using object notation) and set flow file attributes accordingly.
... View more
10-20-2016
02:53 PM
If your Python script is just calling out to the operating system, consider using ExecuteStreamCommand or ExecuteProcess instead. Otherwise there are (hopefully!) some pure Python module(s) for doing HTTP requests, rather than using curl via os.system().
... View more
10-15-2016
06:10 PM
Try building with the -Phortonworks profile, or adding http://repo.hortonworks.com/content/repositories/releases/ to the list of repositories in the POM
... View more
10-14-2016
07:06 PM
1 Kudo
You are running into NIFI-2873, this will be fixed in an upcoming version.
... View more