Member since
11-16-2015
905
Posts
665
Kudos Received
249
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 414 | 09-30-2025 05:23 AM | |
| 726 | 06-26-2025 01:21 PM | |
| 629 | 06-19-2025 02:48 PM | |
| 837 | 05-30-2025 01:53 PM | |
| 11330 | 02-22-2024 12:38 PM |
08-11-2016
03:09 PM
1 Kudo
In addition to @Andrew Grande 's answer, since it is a custom processor you could add a property to specify additional folders, JARs, etc. and have the processor build a classloader using those locations. Then you wouldn't have to worry about a NiFi-relative location, although the location(s) would still need to be accessible by each node running the processors (to Andrew's point). There are examples of this "modules" property in the ExecuteScript and JoltTransformJSON processors.
... View more
08-10-2016
12:43 AM
2 Kudos
This is a bug in NiFi/HDF, recorded here. As you mentioned, due to limitations in the various JDBC drivers, it is usually expected that the returned values will adhere to the specification. In this case, a Timestamp value should have supported as much precision as feasible, hence the bug. In contrast however, for Time types this is not as consistent. A possible workaround is to schedule the QueryDatabaseTable processor to run at intervals when you expect new data, and also maybe a DetectDuplicate processor somewhere in your flow.
... View more
08-03-2016
07:29 PM
4 Kudos
I’ve seen this too, I believe the errors are mostly red herrings. I believe somewhere in the stack trace will be a statement that the client can’t connect to the data node, and it will list the internal IP (10.0.2.15, e.g.) instead of 127.0.0.1. That causes the minReplication issue, etc. This setting is supposed to fix it: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html#Clients_use_Hostnames_when_connecting_to_DataNodes It is supposed to make the client(s) do DNS resolution, but I never got it working. If it does work, then when the NameNode or whoever tells the client where the DataNode is, it gives back a hostname instead of an IP, which should then resolve on the client (may need to update /etc/hosts) to localhost (since the sandbox is using NAT). The ports also need to be directly-forwarded (50010 I believe). An alternative could be to switch the sandbox to a Host-Only Adapter, then it has its own IP. However IIRC Hadoop hard codes (at least used to) the IP down in its bowels so I’m not sure that would work by itself either. @bbende has some ideas in another HCC post as well: https://community.hortonworks.com/questions/47083/nifi-puthdfs-error.html
... View more
08-03-2016
12:45 AM
1 Kudo
With the first ReplaceText you could leave yourself a marker like '@@DATE_HERE@@' as the date_time value, then as your intuition suggests, you can have another ReplaceText after ConvertJSONtoSQL to match the marker and replace it with the Expression Language statement, thereby removing the quotes by matching them with the marker and replacing with the correct literal.
... View more
08-01-2016
12:58 PM
2 Kudos
It could likely be done with a combination of processors (one part of the flow to read the file and load the conditions into a DistributedMapCache, another to read the input file (GetFile or ListFile -> FetchFile) and possibly split into individual records (with SplitText), extract the desired values with ExtractText, then get the conditions from the DistributedMapCache and route (with RouteOnAttribute) to the various paths. If you are comfortable with a programming language like Groovy, Jython, JRuby, Lua, or Javascript, you could use InvokeScriptedProcessor to accomplish any/all of the above. I'd recommend you keep the script to handling just the reading of the config file and the filtering of the data, as the other processors above handle the remaining tasks very well. If you will only have two routes, you can also use ExecuteScript for scripting, but that processor only gives you "success" and "failure" routes. InvokeScriptedProcessor lets you implement a full Processor so you can define your own relationships/routes. I have some examples (here and here) of InvokeScriptedProcessor, along with many other examples of scripting in NiFi, on my blog.
... View more
07-27-2016
05:31 PM
1 Kudo
It is certainly worth considering the addition of various clauses to be added to the QueryDatabaseTable (and soon, the GenerateTableFetch) processors. We will also have to consider whether the various drivers (Oracle, MySQL, Postgres, etc.) support such notation (or some variant), and how to handle the clauses if the database type does not support them. Do you mind filing a Jira case for this improvement?
... View more
07-27-2016
04:11 AM
1 Kudo
Perhaps you are being rate-limited by the Twitter API? How often is GetTwitter executing? You can check this on the Scheduling tab of the processor configuration dialog. If "Run Schedule" is set to zero seconds, then the processor will run as fast as possible, which could certainly cause throttling / rate-limiting and thus an eventual lack of data.
... View more
07-27-2016
01:34 AM
5 Kudos
Certainly! The Module Directory property in the ExecuteScript processor is for this purpose, you can give it a comma-separated list of directories and/or JAR files, and it will add them to the script's classpath. I have a blog post with an example (bringing in Hazelcast to get data into flowfiles): http://funnifi.blogspot.com/2016/02/executescript-using-modules.html Also if you add the Apache Ivy JAR to your NiFi lib/ folder (normally a no-no but ok in this case), you can even leverage the Grab annotation to bring in dependencies, I have a post with an example here: http://funnifi.blogspot.com/2016/05/using-groovy-grab-with-executescript.html
... View more
07-21-2016
07:28 PM
If you stop the processor that ConvertJSONtoSQL is connected to, you will see your flow files in the connection queue (between the processors). You can right-click on that and choose ListQueue, then pick any of the files and click on the Info button (looks like a question mark) and choose the Attributes tab. That should show all the flow file attributes including the sql.args pairs. Alternatively you can connect ConvertJSONtoSQL to a LogAttribute processor and check logs/nifi-app.log to see the attributes being printed out.
... View more
07-21-2016
05:35 PM
2 Kudos
The SQL generated is a prepared statement (the question marks are placeholders for the values). In the flow file coming out of ConvertJSONtoSQL, you should see attributes on the flow file such as "sql.args.1.type" and "sql.args.1.value", there should be a pair of attributes like that for each of the columns (looks like 6). Are those attributes present and valid?
... View more