About mburgess

mburgess · ‎12-05-2016

Both PutElasticsearch and FetchElasticsearch use the transport client. There are Http versions of both (PutElasticsearchHttp and FetchElasticsearchHttp) that use the REST API.

mburgess · ‎12-02-2016

What error(s) are you seeing? If it mentions Avro, then if your column names are in Chinese, it's likely that Avro does not accept them. This may be alleviated in NiFi 1.1.0 with NIFI-2262, but it would just replace non-Avro-compatible characters with underscores, so you may face a "duplicate field" exception. In that case you would need column aliases in your SELECT statement to use Avro-compatible names for the columns.

mburgess · ‎11-30-2016

Depending on how fast the flow files are coming through, using the timestamp might result in the same filename as well. You could use ${uuid}, which is the UUID of the flow file (guaranteed to be unique), or ${nextInt()}, which is an auto-incrementing value

mburgess · ‎11-28-2016

To follow up on your first question above, Groovy has many features and supports multiple paradigms such as OO (like Java), functional programming, etc. Also it can be used as a scripting language, which is why you don't often see explicitly-defined classes and methods in the example scripts (both in @Artem Ervits answer and the examples on my blog). For ExecuteScript, the Script Body is treated like a script, so it can be evaluated without needing to define a top-level class. Under the hood, Groovy wraps the script in a Script object (such that it obeys the JVM rules). For your second question, you can add additional logic inside the inputStream.eachLine closure to do any transformations. So instead of just writing out a[1] a[2] after you've tokenized the input stream, you can do additional things such as: if(a[2].startsWith('3300')) a[2] = a[2].replaceFirst('33','22') if(a[2].startsWith('0')) a[2] = a[2].replaceFirst('0','00212') Then you can output the space-separated columns.

mburgess · ‎11-22-2016

Yes, PutSQL accepts any SQL statement (except Callable statements like stored procedures) that does not return a result set, so DDL/DML commands such as LOAD INTO or CREATE TABLE, etc. are supported.

mburgess · ‎11-21-2016

This is a bug, I have filed it under NIFI-3076. Workarounds might include changing the domain_id column to be signed (which is probably not desired), or using a cast() function to convert it into a data type (long, e.g.) that will be handled better for the time being. If you use a cast() or other function, you may want a column alias to ensure the column/field name is the one you want.

mburgess · ‎11-21-2016

Paramiko uses Crypto which is a native module, so this is not pure Python either and cannot be used in ExecuteScript. ExecuteProcess or ExecuteStreamCommand should work though.

mburgess · ‎11-21-2016

According to Python documentation, using key_file and cert_file is deprecated, they recommend you pass in a context (one that has been configured by calling load_cert_chain). You'll need a certfile and a keyfile there too, which you can get using various openssl commands (assuming you have openssl installed). For example, to export a client secret key from a PKCS12 keystore to a PEM file: openssl pkcs12 -in CN=<something_you_typed>_OU=Apache NiFi.p12 -nodes -nocerts -out client.key Or to export a server private key from a JKS keystore to a PEM file: keytool -importkeystore -srckeystore <keystore.jks> -destkeystore keystore.p12 -deststoretype PKCS12 openssl pkcs12 -in keystore.p12 -nodes -nocerts -out nifi.key Or to export a CA cert from a JKS keystore to a PEM file: keytool -export -alias <your_alias> -file ca.der -keystore <truststore.jks> openssl x509 -inform der -in ca.der -out ca.pem

mburgess · ‎11-21-2016

PutSQL has a mechanism for batching together statements that were split by processors such as SplitText. Set the "Support Fragmented Transactions" property to true, and PutSQL will wait until all flow files with the same fragment.identifier have arrived, then it will process them all as a single batch. There has also been talk of implementing the same improvement for PutSQL as is being done for PutHiveQL (NIFI-3031), to support multiple statements from a single flow file. Please feel free to file a Jira for this if you like.

mburgess · ‎11-18-2016

QueryCassandra does not support user-defined types, and instead will convert the values to strings. As a workaround, you can use ExecuteScript to parse the strings into values. Here is an example Groovy script to accomplish this: import groovy.json.* def flowFile = session.get() if(!flowFile) return def directReport = flowFile.getAttribute('direct_report') def json = new JsonSlurper().setType(JsonParserType.LAX).parseText(directReport) json*.key.each { key -> flowFile = session.putAttribute(flowFile, key, json[key]) } session.provenanceReporter.modifyAttributes(flowFile) session.transfer(flowFile, REL_SUCCESS) This script assumes you have used something like EvaluateJsonPath to extract $.results[0].directReports[0] into an attribute named 'direct_report'. It parses the JSON object and adds attributes to the flow file for each key/value pair in the object. You can adjust this to work with content rather than attributes, e.g. I have examples of various scripts on my blog.

Online	Offline
Last Visited	‎01-16-2026 01:45 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎01-16-2026 01:45 PM
Posts	911
Kudos received	662

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: PutElasticHttp Connection reset by peer: socke...

Re: How to insert data into Hive using NiFi ?

Re: NIFI splittext to split the single file into m...

Re: Groovy ExcuteScript example ?

Re: SQL loader using PutSQL

Re: Nifi 1.0: ExecuteSQL having issues with UNSIGN...

Re: cannot use numpy or scipy in python in nifi ex...

Re: How to access kerberos NIFI cluster with nifi ...

Re: SQL loader using PutSQL

Re: Problem in JSON result with QueryCassandra pr...