About mburgess

mburgess · ‎06-30-2017

@Alvin Jin To answer your question about which processors to use: it depends on what you want to do with the whole CSV file. Your question only mentions splitting and ignoring the header, the CSVReader takes care of that. The record-aware processors in NiFi 1.3.0 include: ConsumeKafkaRecord_0_10: Gets messages from a Kafka topic, bundles into a single flow file instead of one per message ConvertRecord: Converts records from one data format to another (Avro to JSON, e.g.) LookupRecord: Uses fields from a record to lookup a value, which can be added back to the record PartitionRecord: Groups "like" records (based on user-provided criteria) into individual flow files PublishKafkaRecord_0_10: Posts messages to a Kafka topic PutDatabaseRecord: Executes a specified operation (INSERT, UPDATE, DELETE, e.g.) on a database for each record in a flow file PutElasticsearchHttpRecord: Executes a specified operation ("index", e.g.) on an Elasticsearch cluster for each record in a flow file QueryRecord: execute SQL queries on fields from the records. This can be used to filter, aggregate, etc. SplitRecord: Splits records into smaller flow files. Usually only used when downstream processors are not record-aware UpdateRecord: Updates field(s) in each record of a flow file Also I wanted to mention, if for some reason all your CSV columns are strings, you can set "Schema Access Strategy to "Use String Fields From Header", and then you don't need a schema or schema registry. Otherwise if you want to provide a schema, you're not required to use a schema registry, you can just paste your schema into the Schema Text property. and set "Schema Access Strategy" to "Use Schema Text Property".

mburgess · ‎06-29-2017

In addition to @Wynner's answer, if you'd like to keep using ExecuteScript, you can pass in arguments as user-defined properties (aka dynamic properties) or flow file attributes and use them in ExecuteScript. For examples on leveraging user-defined properties in ExecuteScript, check out Part 3 of my ExecuteScript Cookbook article series in HCC, it has examples in Jython.

mburgess · ‎06-28-2017

You could set the Header Line Count to 0, then send the flowfiles to a RouteOnAttribute processor where you can "skip" the first line by routing on the following Expression Language statement: ${fragment.index:gt(0)} The first line will be routed to "unmatched" and the rest to "matched" or the user-defined property name (depending on the value of the Routing Strategy property). Note that this requires the Line Split Count property be set to 1 in SplitText. Alternatively, if you are using (or can upgrade to) NiFi 1.3.0, you can use a record-aware processor with a CSVReader. This reader can be configured to (among other things) skip the header line. The record-aware processors also offer better performance when working with flow files that contain many "records" (such as a CSV file where each "record" is a row).

mburgess · ‎06-28-2017

As of NiFi 1.3.0, you can use UpdateRecord for this. If your incoming field name is "createdOn", you can add a user-defined property named "/createdOn" whose value is the following: ${field.value:toDate('yyyy-mm-dd HH:mm:ss.SSS'):toNumber()} Note that you may need to change the type of createdOn from String (in the Reader's schema) to Long (in the Writer's schema).

mburgess · ‎06-23-2017

You can try a thread dump (with jstack or nifi.sh dump) while it is waiting to shut down, you may be able to spot the culprit in the output.

mburgess · ‎06-21-2017

I tested this with Arabic characters in my text field, and it worked fine. You're saying you still get the error when using my suggested lines?

mburgess · ‎06-21-2017

The documentation says "The Expression Language allows single quotes and double quotes to be used interchangeably". Try double-quotes in your EL expression.

mburgess · ‎06-21-2017

You can accomplish this with ExecuteScript, the following example uses Groovy as the language and Jayway's JsonPath as the library for JSONPath parsing. First I had to download 3 JARs (json-path and its required transitive dependencies) into a directory: Then I set the Module Directory property to the path of this directory: Note that I have also added a dynamic property, whose name will become an attribute and whose value supports Expression Language and (after evaluation) should contain a JSONPath expression used to retrieve the value from the content of the JSON flow file. The Script Body is the following Groovy script: import com.jayway.jsonpath.* def flowFile = session.get() if(!flowFile) return def inputStream = session.read(flowFile) def json = JsonPath.parse(inputStream) inputStream.close() context.properties.findAll {p,s -> p.dynamic}.each {pd, name -> def prop = context.getProperty(pd) try { flowFile = session.putAttribute(flowFile, pd.name, json.read(prop.evaluateAttributeExpressions(flowFile).value)) } catch (e) { log.error("Error evaluating JSONPath expression in property $name: ${prop?.value} , ignoring...", e) } } session.transfer(flowFile, REL_SUCCESS) I tested this with a GenerateFlowFile: After the ExecuteScript transfers the flow file, it has the desired attribute name/value: This should work with any number of attributes/JSONPaths per flow file. In addition, I have written NIFI-4100 to cover the improvement to the EvaluateJsonPath processor to support Expression Language.

mburgess · ‎06-21-2017

Also, as of NiFi 1.3.0 / HDF 3.0.0, GenerateTableFetch accepts incoming connections/flow files, so you can use ListDatabaseTables -> GenerateTableFetch -> RPG -> Input Port -> ExecuteSQL to fully distribute the fetching of batches of rows across your NiFi cluster. The RPG -> Input Port part is optional and only used on a cluster if you want to fetch rows in parallel.

mburgess · ‎06-21-2017

The answer in this StackOverflow post refers to documentation saying an array will be returned; this appears to be happening after the JSONPath is evaluated (which is why appending a [0] does not work). The post also implies the answer: in NiFi, you can choose "json" as the Return Type and "flowfile-attribute" as the Destination in EvaluateJsonPath (let's say your dynamic property has key "my.attr" and a value of your JSONPath expression above), then follow that processor with an UpdateAttribute processor, setting "my.attr" to the following: ${my.attr:jsonPath('$[0]')} This will overwrite the original "my.attr" value by hoisting the value out of the array. If you need that value back in the content, you can follow this with a ReplaceText processor to replace the Entire Text with the value of "my.attr".

Online	Offline
Last Visited	‎01-16-2026 01:45 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎01-16-2026 01:45 PM
Posts	911
Kudos received	662

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: How to remove the header when using NiFi Split...

Re: How to pass command line arguments in executes...

Re: How to remove the header when using NiFi Split...

Re: NiFi processor: Convert string(datetime format...

Re: NiFi - restarting a node gracefully

Re: Execute script processor don't support utf-8 e...

Re: NiFi expression language inside ExecuteSQL sta...

Re: How to use an attribute in nifi to evaluate js...

Re: Dynamic Creation of Processors in NiFi

Re: Unable to return a scalar value for the expres...