About joseomjr

joseomjr · ‎10-01-2023

If you can run this python code under Python without any external modules, you should be able to run it as a scripted processor and have everything happen inside of NiFi.

joseomjr · ‎09-21-2023

Post an example of the actual CSV and I can see if I can help with this.

joseomjr · ‎09-18-2023

Forgot the ";" in the replacement value $1($2='$3');

joseomjr · ‎09-16-2023

I would do this with a Groovy based InvokeScriptedProcessor Using this code: import groovy.json.JsonOutput import groovy.json.JsonSlurper import java.nio.charset.StandardCharsets import org.apache.commons.io.IOUtils class GroovyProcessor implements Processor { PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder() .name("BATCH_SIZE") .displayName("Batch Size") .description("The number of incoming FlowFiles to process in a single execution of this processor.") .required(true) .defaultValue("1000") .addValidator(StandardValidators.POSITIVE_INTEGER_VALIDATOR) .build() Relationship REL_SUCCESS = new Relationship.Builder() .name("success") .description('FlowFiles that were successfully processed are routed here') .build() Relationship REL_FAILURE = new Relationship.Builder() .name("failure") .description('FlowFiles that were not successfully processed are routed here') .build() ComponentLog log void initialize(ProcessorInitializationContext context) { log = context.logger } Set<Relationship> getRelationships() { return [REL_FAILURE, REL_SUCCESS] as Set } Collection<ValidationResult> validate(ValidationContext context) { null } PropertyDescriptor getPropertyDescriptor(String name) { null } void onPropertyModified(PropertyDescriptor descriptor, String oldValue, String newValue) { } List<PropertyDescriptor> getPropertyDescriptors() { Collections.unmodifiableList([BATCH_SIZE]) as List<PropertyDescriptor> } String getIdentifier() { null } JsonSlurper jsonSlurper = new JsonSlurper() JsonOutput jsonOutput = new JsonOutput() void onTrigger(ProcessContext context, ProcessSessionFactory sessionFactory) throws ProcessException { ProcessSession session = sessionFactory.createSession() try { List<FlowFile> flowFiles = session.get(context.getProperty(BATCH_SIZE).asInteger()) if (!flowFiles) return flowFiles.each { flowFile -> List data = null session.read(flowFile, { inputStream -> data = jsonSlurper.parseText(IOUtils.toString(inputStream, StandardCharsets.UTF_8)) } as InputStreamCallback) List outputData = [] data.each { order -> outputData.add("${order.orderId} ${order.orderName}") order.orderItems.each { orderItem -> outputData.add("${orderItem.orderItemId} ${orderItem.orderItemName}") } } FlowFile newFlowFile = session.create() newFlowFile = session.write(newFlowFile, { outputStream -> outputStream.write(outputData.join('\n').getBytes(StandardCharsets.UTF_8)) } as OutputStreamCallback) session.transfer(newFlowFile, REL_SUCCESS) session.remove(flowFile) } session.commit() } catch (final Throwable t) { log.error('{} failed to process due to {}; rolling back session', [this, t] as Object[]) session.rollback(true) throw t } } } processor = new GroovyProcessor() Don't let all that code scare you when the part that's doing the formatting is only these lines: This is the generated output:

joseomjr · ‎09-16-2023

Are you or have you considered leveraging ES bulk API? Bulk API | Elasticsearch Guide [8.9] | Elastic

joseomjr · ‎09-16-2023

ReplaceText processor should work.. This RegEx search pattern should do the trick: ^(.*?)$([^=]*)=([^$]*)\);$ With this replacement value: $1($2='$3') Don't forget to set the evaluation mode to line-by-line...

joseomjr · ‎09-16-2023

As far as I know, NiFi leverage Jython and would currently limit you to using Python 2.7 compatible code and only modules written in pure Python. Home | Jython

joseomjr · ‎09-16-2023

This doesn't look like a valid regex pattern \E^modbus_log\.\d{4}-\d{2}-\d{2}_\d{2}-\d{2}$ Maybe it should be ^modbus_log\.\d{4}-\d{2}-\d{2}_\d{2}-\d{2}$

joseomjr · ‎08-28-2023

The 2 lines I provided are the very first lines and then all your code comes after.

joseomjr · ‎07-29-2023

NiFi is awesome with so many out-of-the-box processors...however, I have found sometimes a very specialized scripted Groovy processor that fetches 1000 or more files at a time to perform significantly faster... especially if your custom processor consolidates several processors into 1.

Online	Offline
Last Visited	‎12-17-2024 09:55 PM

Member Since	‎06-14-2023 12:02 PM
Last Visited	‎12-17-2024 09:55 PM
Posts	95
Kudos received	33

Cloudera Community

Re: Nifi 2.0.0 M1 Installation error with python

Re: how to replace empty string with null in neste...

Re: ListenUDP Fault tolerance

Re: terminating kafka connection if publish kafka ...

Re: unable to resolve class groovy.yaml.YamlSlurpe...

Re: Nifi Unzip files

Re: Manipulate string in CSV using NiFi processor

Re: How to replace/add some text for a matching pa...

Re: How to access the nested fields in a json usin...

Re: How to speed up the PutElasticsearchHttpRecord...

Re: How to replace/add some text for a matching pa...

Re: No module name Selenium

Re: NiFi - TailFile - try to read multiple files ,...

Re: Unable to import python external modules in ni...

Re: Nifi Perfromnce issues