Member since
11-16-2015
905
Posts
664
Kudos Received
249
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 252 | 09-30-2025 05:23 AM | |
| 665 | 06-26-2025 01:21 PM | |
| 510 | 06-19-2025 02:48 PM | |
| 758 | 05-30-2025 01:53 PM | |
| 11000 | 02-22-2024 12:38 PM |
06-07-2017
12:57 PM
What version of NiFi/HDF are you using? Also, are all those HiveConnectionPool instances valid (meaning they are used in various parts of your flow, in process groups, etc.)? If not, try deleting the invalid ones. Once you configure and save your FPCluster-HiveConnectionPool, it should be available to the Hive processors, so I'm not sure why you're not seeing it.
... View more
06-07-2017
02:00 AM
1 Kudo
NiFi doesn't come with a PostgreSQL driver, you will need to get one and point to its location in the "Database Driver Jar Url" property in the DBCPConnectionPool configuration dialog above. If you already have one and have placed it in NiFi's lib/ directory, this can cause problems and is not recommended. Best practice is to put JDBC drivers in a separate location and refer to that location in the "Database Driver Jar Url" property.
... View more
06-07-2017
01:27 AM
1 Kudo
The HiveConnectionPool controller service needs to be configured to connect to a Hive instance. If you edit the controller service (using the pencil icon on the right side of the controller service entry in your Process Group Configuration window above, it will list the properties (required properties are in bold). The Database Connection URL is probably something like jdbc:hive2://sandbox.hortonworks.com:10000/default. The Hive Configuration Resources property should point at a comma-separated list of configuration files (usually core-site.xml and hive-site.xml at least). If you have username/password authentication set up for your Hive instance, you will need to supply values for those properties as well. Alternatively if your Hive instance is secured with Kerberos, you will need to supply values for those properties. Once your HiveConnectionPool has been configured correctly (and saved by hitting the Apply button), you will need to enable it by clicking the lightning bolt icon (also on the right side near the pencil icon) then the Enable button. Return to the canvas and either route the "success" and/or "failure" connections from PutHiveQL to some other processor, or auto-terminate the relationship(s) by opening the Configuration dialog for PutHiveQL and selecting the checkboxes for the relationship(s) under the "Auto-terminate relationships" section. At that point your PutHiveQL processor should be valid (you will see a red square icon in the upper-left corner of the processor instead of the yellow triangle).
... View more
06-05-2017
01:58 PM
1 Kudo
You can use the JoltTransformJSON processor for this. Here is a Chain spec that should get you what you want: [
{
"operation": "shift",
"spec": {
"0": {
"*": "header"
},
"*": "data[]"
}
},
{
"operation": "shift",
"spec": {
"data": {
"*": {
"*": {
"@0": "[&2].@(4,header[&])"
}
}
}
}
}
] The first operation splits the top array into "header" (containing the first element) and "data" (containing the rest). The second operation creates an array of objects, each object having the keys from the header associated with the values from each element in the "data" array. With your input above, I get the following output: [ {
"host" : "DUSTSADMIN.ads.xyz.de",
"host_icons" : "menu",
"host_state" : "UP",
"num_services_crit" : "0",
"num_services_ok" : "28",
"num_services_pending" : "0",
"num_services_unknown" : "0",
"num_services_warn" : "0"
}, {
"host" : "DUSTSVMDC01.ads.xyz.de",
"host_icons" : "menu",
"host_state" : "UP",
"num_services_crit" : "0",
"num_services_ok" : "34",
"num_services_pending" : "0",
"num_services_unknown" : "0",
"num_services_warn" : "0"
}, {
"host" : "DUSTSVMDC02.ads.xyz.de",
"host_icons" : "menu",
"host_state" : "UP",
"num_services_crit" : "0",
"num_services_ok" : "34",
"num_services_pending" : "0",
"num_services_unknown" : "0",
"num_services_warn" : "0"
} ]
... View more
06-01-2017
03:27 PM
1 Kudo
FetchElasticsearch is used to get a single document from an ES cluster. Each document in ES has a document identifier (or "_id") associated with it, and that identifier is what should be supplied to the Document Identifier property. If you don't know the document identifier for the document(s) you're looking for, then QueryElasticsearchHttp is your best bet. It allows you to use the Query String "mini-language" to search for fields with desired values (see here for more information). You can then parse the results using any number of processors, such as EvaluateJsonPath to get individual fields from the results, SplitJson if there are multiple results, etc.
... View more
05-31-2017
03:26 PM
You can add dynamic properties in the InvokeHttp processor's configuration dialog as described here.
... View more
05-24-2017
01:42 PM
2 Kudos
As of NiFi 1.2.0 you may be able to use ConvertRecord to do this, with a JsonTreeReader and a CSVRecordSetWriter (with a Record separator of comma and a value separator of a single space). Prior to 1.2.0 (or if the above approach doesn't work), you can use ExecuteScript. Here is a sample Groovy script that will read all the "term" values from the incoming JSON object and add an attribute called "terms" containing the comma-separated list: def flowFile = session.get()
if(!flowFile) return
def input = session.read(flowFile)
def json = new groovy.json.JsonSlurper().parse(input)
def terms = json.results.collect { it.term }.join(',')
input.close()
flowFile = session.putAttribute(flowFile, 'terms', terms)
session.transfer(flowFile, REL_SUCCESS) If instead you need to replace the content of the flow file with the comma-separated list: def flowFile = session.get()
if(!flowFile) return
flowFile = session.write(flowFile, { inputStream, outputStream ->
def json = new groovy.json.JsonSlurper().parse(inputStream)
def terms = json.results.collect { it.term }.join(',')
outputStream.write(terms.bytes)
} as StreamCallback)
flowFile = session.putAttribute(flowFile, 'mime.type', 'text/csv')
session.transfer(flowFile, REL_SUCCESS)
... View more
05-19-2017
07:16 PM
I believe you'll also need the following in your my.cnf: binlog_format=row
... View more
05-18-2017
02:10 PM
Have you performed any INSERT, UPDATE, DELETE events since you enabled binary logging? You probably don't need to include Begin/Commit events unless you are doing auditing or your target DB needs them. In general, should you ever want to "reset" the CDC processor to get all the binlog records, set Retrieve All Records to true and clear the state of the processor (i.e. right-click on the stopped processor, choose View State, then Clear State).
... View more
05-16-2017
09:22 PM
You can still use RouteOnAttribute to route the flow files with missing values somewhere separate from the files that have all the values populated. In UpdateAttribute, you can call the attribute whatever you want the message to be, and for the JOLT transform, you can have multiple entries in the spec, each matching a separate entry from the JSON.
... View more