Member since
11-16-2015
892
Posts
649
Kudos Received
245
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5206 | 02-22-2024 12:38 PM | |
1337 | 02-02-2023 07:07 AM | |
3004 | 12-07-2021 09:19 AM | |
4155 | 03-20-2020 12:34 PM | |
13948 | 01-27-2020 07:57 AM |
07-19-2016
04:54 PM
Hans, There is an email thread that talks about how to (currently) get EvaluateXPath to work with namespaces. The email talks about default namespaces but it works for explicit namespaces too. If multiple namespaces have the same "local name", that can cause problems however. There is a Jira case to add support for namespaces in XPath/XQuery processors: https://issues.apache.org/jira/browse/NIFI-1023
... View more
07-18-2016
02:40 AM
The error in the bulletin is unfortunately not very descriptive, can you check the logs for the cause of that exception?
... View more
07-17-2016
10:21 PM
Here's my answer from StackOverflow: There is a subproject of Apache NiFi called MiNiFi, which (among other things) aims to have agents on devices and such in order to collect data at its point of creation. This will include native agents, thus a JVM will not be required. The proposed roadmap is here, it mentions the development of native agent(s).
... View more
07-15-2016
02:18 PM
2 Kudos
You could use GetFile -> SplitText -> ExtractText -> InvokeHttp: GetFile gets the configuration file, set "Keep source file" to true and schedule it to run once a day SplitText splits the file into multiple flow files, each containing a single line/URL ExtractText can put the contents of the flow file into an attribute (called "my.url" for example) InvokeHttp can be configured to use an Expression Language construct for the URL property (such as "${my.url}")
... View more
07-14-2016
04:47 PM
1 Kudo
QueryDatabaseTable would require a "last modified" column in the table(s) in order to detect updates, and probably a "logical delete" flag (i.e. boolean column) in that (or a helper) table in order to detect deletes. This is similar to what Apache Sqoop does. If you have the Enterprise Edition of SQL Server, you may be able to enable their Change Data Capture feature. Then, for incremental changes, you can use QueryDatabaseTable against the "CDC Table" rather than your source tables. For strict migration (no incremental fetching of updates) of multiple tables in SQL Server, if you can generate individual flow files, each containing an attribute such as "table.name", then you could parallelize across a NiFi cluster by sending them to ExecuteSQL with the query set to "SELECT * FROM ${table.name}". In this case each instance of ExecuteSQL will get all the rows from one table into an Avro record and send it along the flow. Regarding MongoDB, I don't believe the MongoDB processors support incremental fetching. QueryDatabaseTable might work on flat documents, but there is a bug that prevents nested fields from being returned, and aliasing the columns won't work for the incremental fetch part. However ExecuteSQL will work if you explicitly list (and alias) the document fields in the SQL statement, but that won't do incremental fetch. You might be able to use Sqoop for such things, but there are additional requirements if using sqoop-import-all-tables, and if doing incremental fetch you'd need 250 calls to sqoop import. Do your tables all have a "last modified" column or some sort of similar structure? Supporting distributed incremental fetch for arbitrary tables is a difficult problem as you'd need to know the appropriate "last modified" column for each table (if they're not named the same and/or present in every table). When tables all behave the same way from an update perspective, it makes this problem much easier.
... View more
07-07-2016
03:36 PM
1 Kudo
In EvaluateJsonPath, you can choose "flowfile-attribute" as the Destination, then the original JSON will still be in the flow file content, and any extracted JSON elements will be in the flowfile's attributes. That can go into RouteOnAttribute for "eventname". Then you can use ReplaceText (or ExecuteScript if you prefer) to create a CQL statement using Expression Language to insert the values from your attributes, or to wrap the entire JSON object in a CQL statement. I have a template that uses ReplaceText to put an entire JSON object into a "INSERT INTO myTable JSON" CQL statement, it is available as a Gist (here). It doesn't have a PutCassandraQL processor at the end, instead its a LogAttribute processor so you can see if the CQL looks right for what you're trying to do.
... View more
06-30-2016
07:57 PM
4 Kudos
In curl the -d means to put the data into the request body. The InvokeHttp processor does not send the contents of the flow file for GET requests, only PUT or POST. However the Elasticsearch Search/Query API accepts GET, so this approach probably won't work. What you may be looking for is the URL Search API, I commented on that in another thread, but will post here too. Using this method, you can put your query in the URL itself. Note that the query parameters look a bit different because it's not JSON, they are HTTP query parameters. In your example you are matching all documents (which is the default I believe) so http://localhost:9200/tweet_library/_search?size=10000 should be all you need for that case. To explicitly match all documents, you can use the q parameter: http://localhost:9200/tweet_library/_search?size=10000&q=*:* There are quite a few query options available with the URL Search API, please see the Elasticsearch documentation for more information.
... View more
06-29-2016
05:53 PM
1 Kudo
Your curl command uses the -d parameter, which means it's sending that JSON in the body of the request. To do that with InvokeHttp, you could have a GenerateFlowFile -> ReplaceText processor before the InvokeHttp, where ReplaceText would set the body to the query you have above. Alternatively you could use the URL Search API for Elasticsearch. In your example you are matching all documents (which is the default I believe) so http://localhost:9200/tweet_library/_search?size=10000 should be all you need for that case. To explicitly match all documents, you can use the q parameter: http://localhost:9200/tweet_library/_search?size=10000&q=*:*
... View more
06-28-2016
12:50 AM
1 Kudo
As
@Pierre Villard mentioned, FetchElasticsearch should not require an incoming connection. This has been captured as part of NIFI-1576. However to extract all documents from a particular Index and (optional) Type, you'll need the Search API, but FetchElasticsearch uses the Get API. To use the Search API, you can use the InvokeHttp processor with your own search query. Please see this related HCC post: https://community.hortonworks.com/questions/41951/how-to-get-all-values-with-expression-language-in.html
... View more
06-27-2016
07:35 PM
The FetchElasticsearch processor uses the Get API, which requires a single document identifier and doesn't support regular expressions. As an alternative, you can use InvokeHttp to call the Multi-Get API or the Search API, which give you more control over the retrieval of multiple documents.
... View more