About mburgess

mburgess · ‎11-28-2017

What version of NiFi are you using? Is the "value" column in your database table a String or a Float/Double? What processor(s) are you using to read from the database? If using ExecuteSQL, could you do something like the following? SELECT metric, CAST(value AS DOUBLE) AS value, timestamp, tags from myTable Alternatively, as of NiFi 1.2.0 (HDF 3.0) you can use the JoltTransformJSON processor to do type conversion (see an example here). Also if you know what the schema is supposed to be, you could use ConvertRecord with a JsonRecordSetWriter which is associated with the "correct" schema. The reader can be an AvroReader which uses the Embedded Schema.

mburgess · ‎11-28-2017

Is it possible to share your nifi-app.log on this question? Also, does this driver work from other utilities (Squirrel SQL, e.g.)?

mburgess · ‎11-27-2017

Is there anything else underneath that stack trace in nifi-app.log? There is usually a Caused By with a ClassNotFoundException or something like that.

mburgess · ‎11-16-2017

You can do it without a schema registry, if your readers and writers "Use 'Schema Text' Property" and you hardcode the schema into the Schema Text property. Since you're using the same for both reader and writer, it's easier to maintain in a registry, but only a simple copy-paste if you don't want to use the registry.

mburgess · ‎11-16-2017

It appears you want to set the destination path to the value of type, followed by the value of id, followed by data.txt, and in the content of that file you want the single-element JSON array containing the object that provided the values. If that is the case: As of NiFi 1.3.0, there is a PartitionRecord processor which will do most of what you want. You can create a JsonReader using the following example schema: {"type":"record","name":"test","namespace":"nifi", "fields": [ {"name":"type","type":"string"}, {"name":"id","type":"string"}, {"name":"content","type":"string"} ] } You can also create a JsonRecordSetWriter that inherits the schema (as of NiFi 1.4.0) or uses the same one (prior to NiFi 1.4.0). Then in PartitionRecord you would create two user-defined properties, say record.type and record.id, configured as follows: Given your example data, you will get 4 flow files, each containing the data from the 4 groups you mention above. Additionally you have record.type and record.id attributes on those flow files. You can route them to UpdateAttribute where you set filename to data.txt and absolute.path to /${type}/${id}. Then you can send them to PutHDFS where you set the Directory to ${absolute.path}.

mburgess · ‎11-16-2017

Thanks very much! I hope to write another series for InvokeScriptedProcessor, ScriptedReportingTask, ScriptedReader, and ScriptedRecordSetWriter someday 🙂

mburgess · ‎11-14-2017

You don't need your own sys.path.append calls, you can just put the directories in a comma-separated list in the Module Directory property of ExecuteScript, and it will call sys.path.append for you. However, because it is Jython, if any of the imports (or any of their dependencies) are native CPython modules, then you won't be able to use them in ExecuteScript. All scripts and modules (and dependencies) must be pure Python. For your exact error, I'd have to see the script (where is "module" defined?), but I suspect that one of these libraries is not pure Python.

mburgess · ‎11-14-2017

Also, depending on what your stored procedure looks like, you may be able to use ExecuteSQL or PutSQL. However they do not support setting output parameters, and I'm not sure if they support input parameters. But if your procedure is hard-coded, then if it returns a ResultSet then ExecuteSQL should work, and if it doesn't, then PutSQL should work. Otherwise the above answer is the best bet.

mburgess · ‎11-14-2017

I'm not familiar with the innards of either Groovy or Jython, but I am guessing that Jython is slower for the following reasons: 1) Groovy was built "for the JVM" and leverages/integrates with Java more cleanly 2) Jython is an implementation of Python for the JVM. Looking briefly at the code, it appears to go back and forth between the Java and Python idioms, so it is more "emulated" than Groovy. 3) Apache Groovy has a large, very active community that consistently works to improve the performance of the code, both compiled and interpreted. In my own experience, Groovy and Javascript (Nashorn) perform much better in the scripted processors than Jython or JRuby. If you choose Jython, there are still a couple of things you can do to improve performance: - Use InvokeScriptedProcessor (ISP) instead of ExecuteScript. ISP is faster because it only loads the script once, then invokes methods on it, rather than ExecuteScript which evaluates the script each time. I have an ISP template in Jython which should make porting your ExecuteScript code easier. - Use ExecuteStreamCommand with command-line Python instead. You won't have the flexibility of accessing attributes, processor state, etc. but if you're just transforming content you should find ExecuteStreamCommand with Python faster. - No matter which language you choose, you can often improve performance if you use session.get(int) instead of session.get(). That way if there are a lot of flow files in the queue, you could call session.get(1000) or something, and process up to 1000 flow files per execution. If your script has a lot of overhead, you may find handling multiple flow files per execution can significantly improve performance.

mburgess · ‎11-13-2017

What do you mean by "add to this JSON a file that I get from FetchFTP"? Is the file you're fetching a JSON file, and you want to add fields to it? Are you Base64 encoding just the JSON from the attributes or the entire file after adding to it? If the incoming file (from FTP) is JSON, and you can get your attributes added to that flow file, then (as of NiFi 1.2.0 / HDF 3.0) you can use JoltTransformJSON to inject your individual attributes as fields into your JSON document (instead of AttributesToJSON). If you have too many attributes for that, your options are a bit more limited. In NiFi 1.3.0, you can use UpdateRecord to add the JSON from an attribute into a field in the other JSON document. You can also do this manually with ReplaceText. However one of the two JSON objects must be in an attribute. Whichever of the two (from AttributesToJSON or FetchFTP) is smaller, you can get that object first and use ExtractText to put the whole thing into an attribute. Note that attributes have limited size and introduce more memory usage, so beware of large JSON objects in attributes. However if one of them fits in a attribute, then you can use the UpdateRecord or ReplaceText processor as described. If you need to just encode one of the JSON objects, then if it is in an attribute you can use UpdateAttribute with the base64Encode Expression Language function, or if it is in content you can use the Base64EncodeContent processor.

Online	Offline
Last Visited	‎01-16-2026 01:45 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎01-16-2026 01:45 PM
Posts	911
Kudos received	662

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: Convert JSON Attribute to Number in NiFi workf...

Re: Exception in DBCPConnectionPool when using Exe...

Re: Exception in DBCPConnectionPool when using Exe...

Re: [Apache Nifi] Split a flowfile based on json-a...

Re: [Apache Nifi] Split a flowfile based on json-a...

Re: ExecuteScript Cookbook (part 1)

Re: org.apache.nifi.processor.exception.ProcessExc...

Re: I am Trying to invoke a (mysql)stored procedur...

Re: Performance of Python Script in NiFi is slower...

Re: base64 encoded file to JSON string