About mburgess

mburgess · ‎06-20-2017

What version of NiFi/HDF are you using? As of NiFi 1.2.0 / HDF 3.0.0, PutHiveQL can accept multiple statements in one flow file, so if you are currently dealing with one INSERT statement per flow file, try MergeContent to batch them up into a single flow file. This should increase performance, but since Hive is an auto-commit database, PutHiveQL is probably not the best choice for large/fast ingest needs. You may be better off putting the data into HDFS and creating/loading a table from it. For PutHiveStreaming, there is a known issue that can reduce performance, it also was fixed in NiFi 1.2.0 / HDF 3.0.0.

mburgess · ‎06-20-2017

That sounds like a different issue altogether, perhaps you'd get more/better responses if you create a new question in HCC to deal with it. The short answer is: your ControllerService implementation can have a method annotated with @OnEnabled, looks like you're calling yours onConfigured(). That method can take a ConfigurationContext: @OnEnabled public void onConfigured(final ConfigurationContext context) throws InitializationException { // Do stuff here } And you can call getProperty(), getProperties(), etc. on it

mburgess · ‎06-20-2017

In addition to the approach (sqlplus with ExecuteProcess / ExecuteStreamCommand) mentioned in the other post, you could also connect via a scripting language using ExecuteScript, calling Connection.prepareCall() and such as described here. I have an example on how to interact with SQL using Groovy and ExecuteScript here.

mburgess · ‎06-20-2017

Usually (as is the case for the nifi-hadoop-bundle), the NAR depends on the nifi-hadoop-libraries-nar, as it provides the Hadoop libraries (such as the provided dependencies you have in your processor POM like hadoop-common), and its parent NAR is nifi-standard-services-api-nar (which you have in your NAR POM). Currently, NARs can only have one parent, so you wouldn't be able to depend on both the hadoop-libraries and standard-services-api NARs at the same time. Since the former depends on the latter, this works for your processor. Try replacing the NAR POM dependency on nifi-standard-services-api-nar to nifi-hadoop-libraries-nar, this should provide all the classes/JARs/dependencies you need.

mburgess · ‎06-20-2017

I think you're running into this issue, perhaps try an explicit module unload at the end of your script? That will probably have a performance impact, but if it works, we can file an improvement Jira to look at adding this to the Jython support in NiFi, to unload the modules if the Module Directory property (or the files it points to) change.

mburgess · ‎06-20-2017

I use bytearray() in my examples, but I haven't been able to figure out when you need it and when you don't. I suspect it might be when the type is 'unicode' or 'java.lang.String' instead of Jython's 'str' type. The following two lines worked for me: insertquery = "insert into Tweets_test values ('"+str(obj['id'])+"','"+obj['text']+"','"+str(obj['id_str'])+"');" outputStream.write(insertquery) This page says that a Jython String will be coerced to byte[] when necessary, and that seems to be what's going on above.

mburgess · ‎06-19-2017

The following configuration of ReplaceText should work for you: where the Search Value is the following: [\[\]](\{|\}) This matches [{ or ]} (and also [} and ]{ which shouldn't show up if your input is valid JSON) and replaces it with whichever curly brace it found. Note that this is a fairly specific solution, where the array is the last element of an object (i.e. the end pattern is not the end-of-object followed by end-of-array, rather the reverse). A more forgiving solution (for your input JSON) might be to use the following Chain spec in a JoltTransformJSON processor: [ { "operation": "shift", "spec": { "result": { "curves": { "*": { "@": "result.curves" } }, "*": "result.&" }, "*": "&" } } ] This "hoists" the object in the 1-element array up one level, achieving the same result as the ReplaceText pattern above.

mburgess · ‎06-19-2017

You can annotate a method (with zero or one arguments, the one being a ProcessContext) with @OnStopped that will get called when the processor is stopped. See the Component Lifecycle section of the Developer Guide for more details.

mburgess · ‎06-19-2017

Is SMPPServerSimulator.run() an asynchronous method, meaning it returns immediately? If so, then the framework may have already committed your session (after returning from the onTrigger() call in AbstractProcessor), in which case the transfer() either didn't work (but it was too late to throw a runtime exception up through to the framework) or it did work but the session now needs another call to commit(). These types of processors that use a separate thread/lifecycle to manage I/O are tricky to integrate into the NiFi architecture, it takes great care to make sure operations are performed in an order that is consistent with the behavior of both the separate entity and the NiFi framework. If the run() method is synchronous, then there is something else going on, but it still seems related to the session not being committed. Can you attach a debugger and see if sesson.commit() gets called after your session.transfer()?

mburgess · ‎06-15-2017

What version of NiFi/HDF are you using? As of NiFi 1.2.0 or HDF 3.0.0, PutHiveQL supports multiple statements (via NIFI-3031) and there is also an EnforceOrder processor (via NIFI-3414), the latter of which could be configured to use the fragment.index attribute for the Order Attribute property and ${fragment.identifier} for the Group Identifier property. Prior to NiFi 1.2.0, you can try adding an UpdateAttribute processor between SplitContent and PutHiveQL, setting the "priority" attribute to ${fragment.index}. Then use a PriorityAttributePrioritizer on the connections between the SplitContent -> UpdateAttribute -> PutHiveQL. I'm not sure if this works as-is because the documentation suggests the priority comparator is lexicographical and not numeric. If that's the case, you'd need some Expression Language functions or an ExecuteScript processor to left-pad the fragment.index values with zeros to make them all the same length.

Online	Offline
Last Visited	‎01-16-2026 01:45 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎01-16-2026 01:45 PM
Posts	911
Kudos received	662

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: PutHiveQL and putHiveStreaming processors in A...

Re: Apache NiFi custom processor expression langua...

Re: Execute Stored Procedure of MSSQL in Nifi

Re: Maven config. for custom processor to Read/Wri...

Re: NiFi Execute Script - Reload Classes

Re: Execute script processor don't support utf-8 e...

Re: NiFi Replace Text: how t0 replace string " [{"...

Re: Nifi custom processor is not generating flowfi...

Re: Nifi custom processor is not generating flowfi...

Re: How to make sure output from splitContent come...