About mburgess

mburgess · ‎08-11-2016

In addition to @Andrew Grande 's answer, since it is a custom processor you could add a property to specify additional folders, JARs, etc. and have the processor build a classloader using those locations. Then you wouldn't have to worry about a NiFi-relative location, although the location(s) would still need to be accessible by each node running the processors (to Andrew's point). There are examples of this "modules" property in the ExecuteScript and JoltTransformJSON processors.

mburgess · ‎08-10-2016

This is a bug in NiFi/HDF, recorded here. As you mentioned, due to limitations in the various JDBC drivers, it is usually expected that the returned values will adhere to the specification. In this case, a Timestamp value should have supported as much precision as feasible, hence the bug. In contrast however, for Time types this is not as consistent. A possible workaround is to schedule the QueryDatabaseTable processor to run at intervals when you expect new data, and also maybe a DetectDuplicate processor somewhere in your flow.

mburgess · ‎08-03-2016

I’ve seen this too, I believe the errors are mostly red herrings. I believe somewhere in the stack trace will be a statement that the client can’t connect to the data node, and it will list the internal IP (10.0.2.15, e.g.) instead of 127.0.0.1. That causes the minReplication issue, etc. This setting is supposed to fix it: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html#Clients_use_Hostnames_when_connecting_to_DataNodes It is supposed to make the client(s) do DNS resolution, but I never got it working. If it does work, then when the NameNode or whoever tells the client where the DataNode is, it gives back a hostname instead of an IP, which should then resolve on the client (may need to update /etc/hosts) to localhost (since the sandbox is using NAT). The ports also need to be directly-forwarded (50010 I believe). An alternative could be to switch the sandbox to a Host-Only Adapter, then it has its own IP. However IIRC Hadoop hard codes (at least used to) the IP down in its bowels so I’m not sure that would work by itself either. @bbende has some ideas in another HCC post as well: https://community.hortonworks.com/questions/47083/nifi-puthdfs-error.html

mburgess · ‎08-03-2016

With the first ReplaceText you could leave yourself a marker like '@@DATE_HERE@@' as the date_time value, then as your intuition suggests, you can have another ReplaceText after ConvertJSONtoSQL to match the marker and replace it with the Expression Language statement, thereby removing the quotes by matching them with the marker and replacing with the correct literal.

mburgess · ‎08-01-2016

It could likely be done with a combination of processors (one part of the flow to read the file and load the conditions into a DistributedMapCache, another to read the input file (GetFile or ListFile -> FetchFile) and possibly split into individual records (with SplitText), extract the desired values with ExtractText, then get the conditions from the DistributedMapCache and route (with RouteOnAttribute) to the various paths. If you are comfortable with a programming language like Groovy, Jython, JRuby, Lua, or Javascript, you could use InvokeScriptedProcessor to accomplish any/all of the above. I'd recommend you keep the script to handling just the reading of the config file and the filtering of the data, as the other processors above handle the remaining tasks very well. If you will only have two routes, you can also use ExecuteScript for scripting, but that processor only gives you "success" and "failure" routes. InvokeScriptedProcessor lets you implement a full Processor so you can define your own relationships/routes. I have some examples (here and here) of InvokeScriptedProcessor, along with many other examples of scripting in NiFi, on my blog.

mburgess · ‎07-27-2016

It is certainly worth considering the addition of various clauses to be added to the QueryDatabaseTable (and soon, the GenerateTableFetch) processors. We will also have to consider whether the various drivers (Oracle, MySQL, Postgres, etc.) support such notation (or some variant), and how to handle the clauses if the database type does not support them. Do you mind filing a Jira case for this improvement?

mburgess · ‎07-27-2016

Perhaps you are being rate-limited by the Twitter API? How often is GetTwitter executing? You can check this on the Scheduling tab of the processor configuration dialog. If "Run Schedule" is set to zero seconds, then the processor will run as fast as possible, which could certainly cause throttling / rate-limiting and thus an eventual lack of data.

mburgess · ‎07-27-2016

Certainly! The Module Directory property in the ExecuteScript processor is for this purpose, you can give it a comma-separated list of directories and/or JAR files, and it will add them to the script's classpath. I have a blog post with an example (bringing in Hazelcast to get data into flowfiles): http://funnifi.blogspot.com/2016/02/executescript-using-modules.html Also if you add the Apache Ivy JAR to your NiFi lib/ folder (normally a no-no but ok in this case), you can even leverage the Grab annotation to bring in dependencies, I have a post with an example here: http://funnifi.blogspot.com/2016/05/using-groovy-grab-with-executescript.html

mburgess · ‎07-21-2016

If you stop the processor that ConvertJSONtoSQL is connected to, you will see your flow files in the connection queue (between the processors). You can right-click on that and choose ListQueue, then pick any of the files and click on the Info button (looks like a question mark) and choose the Attributes tab. That should show all the flow file attributes including the sql.args pairs. Alternatively you can connect ConvertJSONtoSQL to a LogAttribute processor and check logs/nifi-app.log to see the attributes being printed out.

mburgess · ‎07-21-2016

The SQL generated is a prepared statement (the question marks are placeholders for the values). In the flow file coming out of ConvertJSONtoSQL, you should see attributes on the flow file such as "sql.args.1.type" and "sql.args.1.value", there should be a pair of attributes like that for each of the columns (looks like 6). Are those attributes present and valid?

Online	Offline
Last Visited	‎10-29-2025 10:31 AM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎10-29-2025 10:31 AM
Posts	905
Kudos received	659

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: where to add external jars in nifi setup.

Re: QueryDatabaseTable, importing based on postgre...

Re: Errors in GetHDFS\PutHDFS using HDF running on...

Re: Issues with Date handling in NiFi

Re: Split data into multiple files using NIFI base...

Re: Add option clause OPTION(hash join) to QueryDa...

Re: nifi gettwitter works intermittently

Re: Nifi ExecuteScript: Using external libraries w...

Re: Newbie NiFi Oracle PutSQL

Re: Newbie NiFi Oracle PutSQL