Member since
11-16-2015
905
Posts
665
Kudos Received
249
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 425 | 09-30-2025 05:23 AM | |
| 754 | 06-26-2025 01:21 PM | |
| 646 | 06-19-2025 02:48 PM | |
| 841 | 05-30-2025 01:53 PM | |
| 11355 | 02-22-2024 12:38 PM |
09-09-2016
07:21 PM
The PutElasticsearch processor uses the Transport API for Elasticsearch, not the HTTP API. This means your port should be 9300 not 9200. The "Identifier Attribute" property is the name of a flow file attribute that contains a unique identifer for the document. If you don't have an identifier you want to use, you can put "uuid", this will use the flow file's UUID as the identifier for the Elasticsearch document. If you do have an identifier for the document, put its value (using UpdateAttribute, EvaluateJsonPath, etc.) into a flow file attribute, and put that attribute's name in the "Identifier Attribute" property. Note you don't use Expression language here, so if your attribute's name is "doc_id", you put "doc_id" in the Identifier Attribute property, not "${doc_id}"
... View more
09-06-2016
12:47 AM
2 Kudos
Yes, you can use ListFile -> FetchFile. ListFile will keep track of which files it has read, so it will only list those files that it has not seen before. FetchFile will get the contents of the files passed in. Together they work like GetFile except ListFile keeps track of the files it has read.
... View more
09-01-2016
09:50 PM
Here's the example for "prefix soup" (a kind of flattening): Input: {
"Rating": 1,
"SecondaryRatings": {
"Design": 4,
"Price": 2,
"RatingDimension3": 1
}
} Spec: [
{
"operation": "shift",
"spec": {
"Rating": "rating-primary",
//
// Turn all the SecondaryRatings into prefixed data
// like "rating-Design" : 4
"SecondaryRatings": {
// the "&" in "rating-&" means go up the tree 0 levels,
// grab what is ther and subtitute it in
"*": "rating-&"
}
}
}
]
... View more
09-01-2016
06:23 PM
1 Kudo
If you click on any of the examples,it should fill in the Input and Spec boxes.
... View more
09-01-2016
06:21 PM
1 Kudo
After your FetchSFTP, the bar-delimited content will be in the content of the flow file, not the attributes. That is followed by an AttributesToJson processor which will overwrite the flow file content with a JSON document containing attributes such as sftp.remote.host, sftp.remote.port, etc. (see the doc for AttributesToJson). I think you may want a SplitText processor after your FetchSFTP processor, to create one flow file for each line in your file. Then you could have an ExtractText processor which could use a regex (with grouping) to create attributes such as column.1, column.2, etc. Then your ReplaceText can use those attributes.
... View more
09-01-2016
06:15 PM
1 Kudo
That class is part of the write-ahead log stuff in nifi-commons. I think there was a recent update to its structure. Now that Apache NiFi 1.0.0 has been released, you should use that instead of the 1.0.0-BETA (which i see in your logs is the version you're on)
... View more
09-01-2016
02:06 PM
4 Kudos
The output from one processor to another is a flow file, which consists of a map of attributes (key/value pairs) and a payload of bytes as the flow file content. The content could be raw binary data (an image, for example) or a text file in any format (JSON, XML, CSV to name a few). The content and attributes of a flow file are manipulated by the processors in different ways, the documentation for each processor will describe what attributes it reads and/or writes, as well as what operations it may perform on the processor. For example, the UpdateAttribute processor allows you to add attributes to (or delete them from) incoming flow files. Another example is SplitJson, which expects incoming flow files to have a JSON object as the flow file content, and then you configure the processor with a JSONPath expression pointing at an array within the object. Then the processor will split the original JSON object into individual JSON objects and send out flow files for each element in the array to the "split" relationship. It also sends the original incoming flow file to the "original" relationship. You can add connections between processors for the relationship(s) defined by the source processor. I encourage you to read the Overview and Getting Started guides for more information on the concepts of NiFi (Flow Files, Processors, Connections, etc.). If you are looking for working examples, there is a set of templates available on the NiFi Wiki.
... View more
08-31-2016
12:42 PM
The FetchElasticsearch processor uses the native transport, whose default port is 9300. In Nifi 0.7.0 (and in the upcoming HDF 2.0), there is a FetchElasticsearchHttp (and PutElasticsearchHttp) which uses the REST API (whose default port is 9200).
... View more
08-27-2016
02:40 PM
1 Kudo
Although the script engine reports its name as "python", it is actually Jython, which can only use pure Python modules, not native modules like numpy/scipy. If this is needed, consider ExecuteProcess or (if you have incoming flow files) ExecuteStreamCommand which can execute the command-line python.
... View more
08-25-2016
06:44 PM
2 Kudos
To add to Scott's answer, you can use QueryDatabaseTable (for a one-time export, if you choose a "maximum value column" like the primary key column) into a ConvertAvroToORC processor (available in the 1.0 GA release), then a PutHDFS processor to get the data into Hadoop. If the table has not been created, ConvertAvroToORC generates partial Hive DDL in an attribute (CREATE TABLE IF NOT EXISTS...), so after PutHDFS you could have a ReplaceText processor to put the DDL statement (along with the file's HDFS location) into the flow file, then send that to a PutHiveQL processor, which would execute the DDL statement, creating the table atop the directory containing your file(s) in HDFS. That might sound a bit complicated, but it is flexible and powerful. I will post a template to the NiFi wiki after 1.0 is released, showing how such a flow would work.
... View more