Member since
11-16-2015
902
Posts
664
Kudos Received
249
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
148 | 09-30-2025 05:23 AM | |
617 | 06-26-2025 01:21 PM | |
452 | 06-19-2025 02:48 PM | |
695 | 05-30-2025 01:53 PM | |
9705 | 02-22-2024 12:38 PM |
11-18-2016
01:13 PM
QueryCassandra does not support user-defined types, and instead will convert the values to strings. As a workaround, you can use ExecuteScript to parse the strings into values. Here is an example Groovy script to accomplish this: import groovy.json.*
def flowFile = session.get()
if(!flowFile) return
def directReport = flowFile.getAttribute('direct_report')
def json = new JsonSlurper().setType(JsonParserType.LAX).parseText(directReport)
json*.key.each { key ->
flowFile = session.putAttribute(flowFile, key, json[key])
}
session.provenanceReporter.modifyAttributes(flowFile)
session.transfer(flowFile, REL_SUCCESS) This script assumes you have used something like EvaluateJsonPath to extract $.results[0].directReports[0] into an attribute named 'direct_report'. It parses the JSON object and adds attributes to the flow file for each key/value pair in the object. You can adjust this to work with content rather than attributes, e.g. I have examples of various scripts on my blog.
... View more
11-18-2016
12:10 PM
1 Kudo
In addition to @Pierre Villard 's suggestion, PutHDFS transfers flow files that have been successfully written to HDFS to the "success" relationship, so you can put a processor downstream from PutHDFS (along the "success" relationship", and at that point you can be sure that the file has been successfully written to HDFS, and can proceed accordingly.
... View more
11-16-2016
10:19 PM
Might need an "AS blob_contents", can't remember
... View more
11-16-2016
10:19 PM
2 Kudos
DB2 might be returning a different JDBC type for BLOB than what the processor is expecting, such that it tries to convert it to a String or something else rather than a byte array. For your workaround, try a column alias for the case() function, so you can set the name of that column to something Avro will like, such as "blob_contents": SELECT case(BLOBTBL.BLOB_CONTENTS as varchar(2000)) blob_contents FROM BLOB_DECOMP BLOBTBL fetch first 10 rows only with UR
... View more
11-15-2016
04:57 PM
1 Kudo
Does it work if you replace the back-slashes with forward slashes in the Database Driver Jar Url property?
... View more
11-15-2016
01:26 PM
3 Kudos
What does your current schema look like? If you have a field with a type of something like ["null","int"] then it is being declared as a "nullable union", meaning the value can be null or a valid integer. If instead you use simply "int" for the type, then it should enforce non-null values for that field. If it does not, then the CSV reader from the Kite SDK (used to parse the CSV in the ConvertCSVtoAvro processor) likely treats missing values as empty or default rather than null. If this is the behavior you're seeing, please feel free to file a Jira to improve the handling of missing CSV values.
... View more
11-14-2016
06:31 PM
For sufficiently small JSON files, you can use EvaluateJsonPath or ExtractText to get the full body of the document into an attribute before the SplitJson, but keep in mind that this will load the document into memory (rather than being in the content repository and only referenced), and if you modify the flow file, both the original and the new flow file will have a copy in memory. This can get unwieldy pretty quickly. If instead you can determine a smaller portion of the document that is needed, EvaluateJsonPath (with the appropriate JSON Path expression) can store that as an attribute instead. Alternatively you might be able to store the original document with PutDistributedMapCache, and then fetch it into an attribute only when it is needed (so also the use of UpdateAttribute to delete it when finished is recommended). A different approach, if you are comfortable with a scripting language such as Javascript or Groovy, is to use ExecuteScript to invert the behavior of SplitJson; that is, keep the flow file content identical to the original content, and instead store each split value as an attribute in its own flow file. This maintains the original content in each flow file, and as I mentioned the content itself will not be "moved" or copied; instead the flow file maintains a reference to the content (which would be unchanged from the original in this case). If you'd like to see this "inverse" behavior supported in SplitJson (so you can choose whether the splits go in attributes or content), please feel free to file a Jira for this capability.
... View more
11-11-2016
04:58 PM
1 Kudo
What do the resulting HiveQL statement (and attributes) look like? Are you using parameters (with attributes like hiveql.args.N.value and such)? If so, then it appears from looking at the code that it expects a long integer (probably days or seconds from Epoch depending on the data type) for the value, and the appropriate JDBC type value for DATE, TIME, or TIMESTAMP. If parameterized statements don't work, perhaps a ReplaceText to build an explicit HiveQL statement will (such as to remove quotes from attributes which are strings, or to cast a literal to the appropriate date type, etc.)
... View more
11-10-2016
05:44 PM
1 Kudo
You can use ExecuteScript with Groovy and the following script (assuming your input is newline-delimited):
def flowFile = session.get()
if(!flowFile) return
def header = ''
session.read(flowFile, { inStream ->
header = new BufferedReader(new InputStreamReader(inStream)).readLine()
} as InputStreamCallback)
flowFile = session.putAttribute(flowFile, 'header', header)
session.transfer(flowFile, REL_SUCCESS)
This puts the first line in an attribute called 'header', which you can use with RouteOnAttribute to decide where to send the flow. Note that this script doesn't do error handling, but you could put a try/catch around the session.read to session.transfer, the catch could route the flow file to REL_FAILURE.
... View more
11-08-2016
08:26 PM
Can you share a stack trace / error log from logs/nifi-app.log? I'm curious to see what part of the code gives a "File too large" error.
... View more