About mburgess

mburgess · ‎06-07-2017

What version of NiFi/HDF are you using? Also, are all those HiveConnectionPool instances valid (meaning they are used in various parts of your flow, in process groups, etc.)? If not, try deleting the invalid ones. Once you configure and save your FPCluster-HiveConnectionPool, it should be available to the Hive processors, so I'm not sure why you're not seeing it.

mburgess · ‎06-07-2017

NiFi doesn't come with a PostgreSQL driver, you will need to get one and point to its location in the "Database Driver Jar Url" property in the DBCPConnectionPool configuration dialog above. If you already have one and have placed it in NiFi's lib/ directory, this can cause problems and is not recommended. Best practice is to put JDBC drivers in a separate location and refer to that location in the "Database Driver Jar Url" property.

mburgess · ‎06-07-2017

The HiveConnectionPool controller service needs to be configured to connect to a Hive instance. If you edit the controller service (using the pencil icon on the right side of the controller service entry in your Process Group Configuration window above, it will list the properties (required properties are in bold). The Database Connection URL is probably something like jdbc:hive2://sandbox.hortonworks.com:10000/default. The Hive Configuration Resources property should point at a comma-separated list of configuration files (usually core-site.xml and hive-site.xml at least). If you have username/password authentication set up for your Hive instance, you will need to supply values for those properties as well. Alternatively if your Hive instance is secured with Kerberos, you will need to supply values for those properties. Once your HiveConnectionPool has been configured correctly (and saved by hitting the Apply button), you will need to enable it by clicking the lightning bolt icon (also on the right side near the pencil icon) then the Enable button. Return to the canvas and either route the "success" and/or "failure" connections from PutHiveQL to some other processor, or auto-terminate the relationship(s) by opening the Configuration dialog for PutHiveQL and selecting the checkboxes for the relationship(s) under the "Auto-terminate relationships" section. At that point your PutHiveQL processor should be valid (you will see a red square icon in the upper-left corner of the processor instead of the yellow triangle).

mburgess · ‎06-05-2017

You can use the JoltTransformJSON processor for this. Here is a Chain spec that should get you what you want: [ { "operation": "shift", "spec": { "0": { "*": "header" }, "*": "data[]" } }, { "operation": "shift", "spec": { "data": { "*": { "*": { "@0": "[&2].@(4,header[&])" } } } } } ] The first operation splits the top array into "header" (containing the first element) and "data" (containing the rest). The second operation creates an array of objects, each object having the keys from the header associated with the values from each element in the "data" array. With your input above, I get the following output: [ { "host" : "DUSTSADMIN.ads.xyz.de", "host_icons" : "menu", "host_state" : "UP", "num_services_crit" : "0", "num_services_ok" : "28", "num_services_pending" : "0", "num_services_unknown" : "0", "num_services_warn" : "0" }, { "host" : "DUSTSVMDC01.ads.xyz.de", "host_icons" : "menu", "host_state" : "UP", "num_services_crit" : "0", "num_services_ok" : "34", "num_services_pending" : "0", "num_services_unknown" : "0", "num_services_warn" : "0" }, { "host" : "DUSTSVMDC02.ads.xyz.de", "host_icons" : "menu", "host_state" : "UP", "num_services_crit" : "0", "num_services_ok" : "34", "num_services_pending" : "0", "num_services_unknown" : "0", "num_services_warn" : "0" } ]

mburgess · ‎06-01-2017

FetchElasticsearch is used to get a single document from an ES cluster. Each document in ES has a document identifier (or "_id") associated with it, and that identifier is what should be supplied to the Document Identifier property. If you don't know the document identifier for the document(s) you're looking for, then QueryElasticsearchHttp is your best bet. It allows you to use the Query String "mini-language" to search for fields with desired values (see here for more information). You can then parse the results using any number of processors, such as EvaluateJsonPath to get individual fields from the results, SplitJson if there are multiple results, etc.

mburgess · ‎05-31-2017

You can add dynamic properties in the InvokeHttp processor's configuration dialog as described here.

mburgess · ‎05-24-2017

As of NiFi 1.2.0 you may be able to use ConvertRecord to do this, with a JsonTreeReader and a CSVRecordSetWriter (with a Record separator of comma and a value separator of a single space). Prior to 1.2.0 (or if the above approach doesn't work), you can use ExecuteScript. Here is a sample Groovy script that will read all the "term" values from the incoming JSON object and add an attribute called "terms" containing the comma-separated list: def flowFile = session.get() if(!flowFile) return def input = session.read(flowFile) def json = new groovy.json.JsonSlurper().parse(input) def terms = json.results.collect { it.term }.join(',') input.close() flowFile = session.putAttribute(flowFile, 'terms', terms) session.transfer(flowFile, REL_SUCCESS) If instead you need to replace the content of the flow file with the comma-separated list: def flowFile = session.get() if(!flowFile) return flowFile = session.write(flowFile, { inputStream, outputStream -> def json = new groovy.json.JsonSlurper().parse(inputStream) def terms = json.results.collect { it.term }.join(',') outputStream.write(terms.bytes) } as StreamCallback) flowFile = session.putAttribute(flowFile, 'mime.type', 'text/csv') session.transfer(flowFile, REL_SUCCESS)

mburgess · ‎05-19-2017

I believe you'll also need the following in your my.cnf: binlog_format=row

mburgess · ‎05-18-2017

Have you performed any INSERT, UPDATE, DELETE events since you enabled binary logging? You probably don't need to include Begin/Commit events unless you are doing auditing or your target DB needs them. In general, should you ever want to "reset" the CDC processor to get all the binlog records, set Retrieve All Records to true and clear the state of the processor (i.e. right-click on the stopped processor, choose View State, then Clear State).

mburgess · ‎05-16-2017

You can still use RouteOnAttribute to route the flow files with missing values somewhere separate from the files that have all the values populated. In UpdateAttribute, you can call the attribute whatever you want the message to be, and for the JOLT transform, you can have multiple entries in the spec, each matching a separate entry from the JSON.

Online	Online
Last Visited	‎10-29-2025 01:23 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎10-29-2025 01:23 PM
Posts	905
Kudos received	657

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: I can't run putHiveQL

Re: How to convert json into sql format through Ni...

Re: I can't run putHiveQL

Re: How tho convert JSON format

Re: Problems configuring FetchElasticSearch proces...

Re: How to pull data from Rest API post url to NiF...

Re: How to transform a json array to a string list...

Re: CaptureChangeMySQL not working in NiFi 1.2

Re: CaptureChangeMySQL not working in NiFi 1.2

Re: NiFi: JSON Arrays