About mburgess

mburgess · ‎11-18-2016

QueryCassandra does not support user-defined types, and instead will convert the values to strings. As a workaround, you can use ExecuteScript to parse the strings into values. Here is an example Groovy script to accomplish this: import groovy.json.* def flowFile = session.get() if(!flowFile) return def directReport = flowFile.getAttribute('direct_report') def json = new JsonSlurper().setType(JsonParserType.LAX).parseText(directReport) json*.key.each { key -> flowFile = session.putAttribute(flowFile, key, json[key]) } session.provenanceReporter.modifyAttributes(flowFile) session.transfer(flowFile, REL_SUCCESS) This script assumes you have used something like EvaluateJsonPath to extract $.results[0].directReports[0] into an attribute named 'direct_report'. It parses the JSON object and adds attributes to the flow file for each key/value pair in the object. You can adjust this to work with content rather than attributes, e.g. I have examples of various scripts on my blog.

mburgess · ‎11-18-2016

In addition to @Pierre Villard 's suggestion, PutHDFS transfers flow files that have been successfully written to HDFS to the "success" relationship, so you can put a processor downstream from PutHDFS (along the "success" relationship", and at that point you can be sure that the file has been successfully written to HDFS, and can proceed accordingly.

mburgess · ‎11-16-2016

Might need an "AS blob_contents", can't remember

mburgess · ‎11-16-2016

DB2 might be returning a different JDBC type for BLOB than what the processor is expecting, such that it tries to convert it to a String or something else rather than a byte array. For your workaround, try a column alias for the case() function, so you can set the name of that column to something Avro will like, such as "blob_contents": SELECT case(BLOBTBL.BLOB_CONTENTS as varchar(2000)) blob_contents FROM BLOB_DECOMP BLOBTBL fetch first 10 rows only with UR

mburgess · ‎11-15-2016

Does it work if you replace the back-slashes with forward slashes in the Database Driver Jar Url property?

mburgess · ‎11-15-2016

What does your current schema look like? If you have a field with a type of something like ["null","int"] then it is being declared as a "nullable union", meaning the value can be null or a valid integer. If instead you use simply "int" for the type, then it should enforce non-null values for that field. If it does not, then the CSV reader from the Kite SDK (used to parse the CSV in the ConvertCSVtoAvro processor) likely treats missing values as empty or default rather than null. If this is the behavior you're seeing, please feel free to file a Jira to improve the handling of missing CSV values.

mburgess · ‎11-14-2016

For sufficiently small JSON files, you can use EvaluateJsonPath or ExtractText to get the full body of the document into an attribute before the SplitJson, but keep in mind that this will load the document into memory (rather than being in the content repository and only referenced), and if you modify the flow file, both the original and the new flow file will have a copy in memory. This can get unwieldy pretty quickly. If instead you can determine a smaller portion of the document that is needed, EvaluateJsonPath (with the appropriate JSON Path expression) can store that as an attribute instead. Alternatively you might be able to store the original document with PutDistributedMapCache, and then fetch it into an attribute only when it is needed (so also the use of UpdateAttribute to delete it when finished is recommended). A different approach, if you are comfortable with a scripting language such as Javascript or Groovy, is to use ExecuteScript to invert the behavior of SplitJson; that is, keep the flow file content identical to the original content, and instead store each split value as an attribute in its own flow file. This maintains the original content in each flow file, and as I mentioned the content itself will not be "moved" or copied; instead the flow file maintains a reference to the content (which would be unchanged from the original in this case). If you'd like to see this "inverse" behavior supported in SplitJson (so you can choose whether the splits go in attributes or content), please feel free to file a Jira for this capability.

mburgess · ‎11-11-2016

What do the resulting HiveQL statement (and attributes) look like? Are you using parameters (with attributes like hiveql.args.N.value and such)? If so, then it appears from looking at the code that it expects a long integer (probably days or seconds from Epoch depending on the data type) for the value, and the appropriate JDBC type value for DATE, TIME, or TIMESTAMP. If parameterized statements don't work, perhaps a ReplaceText to build an explicit HiveQL statement will (such as to remove quotes from attributes which are strings, or to cast a literal to the appropriate date type, etc.)

mburgess · ‎11-10-2016

You can use ExecuteScript with Groovy and the following script (assuming your input is newline-delimited): def flowFile = session.get() if(!flowFile) return def header = '' session.read(flowFile, { inStream -> header = new BufferedReader(new InputStreamReader(inStream)).readLine() } as InputStreamCallback) flowFile = session.putAttribute(flowFile, 'header', header) session.transfer(flowFile, REL_SUCCESS) This puts the first line in an attribute called 'header', which you can use with RouteOnAttribute to decide where to send the flow. Note that this script doesn't do error handling, but you could put a try/catch around the session.read to session.transfer, the catch could route the flow file to REL_FAILURE.

mburgess · ‎11-08-2016

Can you share a stack trace / error log from logs/nifi-app.log? I'm curious to see what part of the code gives a "File too large" error.

Online	Online
Last Visited	‎10-21-2025 04:30 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎10-21-2025 04:30 PM
Posts	902
Kudos received	657

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: Problem in JSON result with QueryCassandra pr...

Re: How to know if files are copied to hdfs?

Re: Nifi cannot handle DB2 BLOB data; sends 'Wrong...

Re: Nifi cannot handle DB2 BLOB data; sends 'Wrong...

Re: DBCP connection pool Issue(Can't load Database...

Re: Is there a way in Apache Nifi, where I can che...

Re: Nifi SplitJson - how to access Original flow f...

Re: Hive Insert statement in NiFi - String to Date...

Re: How to read only the header from a flowfile

Re: Need to convert a hex file to another format u...