About glad1

joseomjr · ‎04-09-2024

I'm not sure if this can be done with out-of-the-box processors but I would do it with a Groovy based InvokeScriptedProcessor with code like this import groovy.json.JsonOutput import groovy.json.JsonSlurper class GroovyProcessor implements Processor { PropertyDescriptor CHUNK_SIZE = new PropertyDescriptor.Builder() .name("CHUNK_SIZE") .displayName("Chunk Size") .description("The chunk size to break up the incoming list of values.") .required(true) .defaultValue("5") .addValidator(StandardValidators.POSITIVE_INTEGER_VALIDATOR) .build() Relationship REL_SUCCESS = new Relationship.Builder() .name("success") .description('FlowFiles that were successfully processed are routed here') .build() Relationship REL_FAILURE = new Relationship.Builder() .name("failure") .description('FlowFiles that were not successfully processed are routed here') .build() ComponentLog log JsonSlurper jsonSlurper = new JsonSlurper() JsonOutput jsonOutput = new JsonOutput() void initialize(ProcessorInitializationContext context) { log = context.logger } Set<Relationship> getRelationships() { Set<Relationship> relationships = new HashSet<>() relationships.add(REL_FAILURE) relationships.add(REL_SUCCESS) return relationships } Collection<ValidationResult> validate(ValidationContext context) { } PropertyDescriptor getPropertyDescriptor(String name) { } void onPropertyModified(PropertyDescriptor descriptor, String oldValue, String newValue) { } List<PropertyDescriptor> getPropertyDescriptors() { List<PropertyDescriptor> descriptors = new ArrayList<>() descriptors.add(CHUNK_SIZE) return Collections.unmodifiableList(descriptors) } String getIdentifier() { } void onScheduled(ProcessContext context) throws ProcessException { } void onUnscheduled(ProcessContext context) throws ProcessException { } void onStopped(ProcessContext context) throws ProcessException { } void setLogger(ComponentLog logger) { } void onTrigger(ProcessContext context, ProcessSessionFactory sessionFactory) throws ProcessException { ProcessSession session = sessionFactory.createSession() try { List<FlowFile> flowFiles = session.get(1) if (!flowFiles) return Integer chunkSize = context.getProperty(CHUNK_SIZE).asInteger() flowFiles.each { FlowFile flowFile -> Map customAttributes = [ "mime.type": "application/json" ] Map data = null session.read(flowFile, { inputStream -> data = jsonSlurper.parse(inputStream) } as InputStreamCallback) List<List<String>> chunkedObjectIDs = data.objectIDs.collate(chunkSize) chunkedObjectIDs.each { chunk -> data = [ "objectIDs": chunk ] FlowFile newFlowFile = session.create() newFlowFile = session.write(newFlowFile, { outputStream -> outputStream.write(jsonOutput.toJson(data).getBytes("UTF-8")) } as OutputStreamCallback) session.putAllAttributes(newFlowFile, customAttributes) session.transfer(newFlowFile, REL_SUCCESS) } session.remove(flowFile) } session.commit() } catch (final Throwable t) { log.error('{} failed to process due to {}; rolling back session', [this, t] as Object[]) session.rollback(true) throw t } } } processor = new GroovyProcessor()

MattWho · ‎01-17-2024

@glad1 No not necessary. I suggested becasue i was still unclear how often your initial ExecuteSQL was producing a source file. The PG makes it easy to throttle per source FLowFile processing so you would get one merged FlowFile for each produced FlowFile. Thanks, Matt

glad1 · ‎01-05-2024

Thanks

glad1 · ‎11-22-2023

^ I've attached the image above. this is how the data looks. I want to clean the first 7 rows and let the 8th row (header row) be first.

glad1 · ‎02-22-2023

Thanks. Everything works fine after restarting my PC

Online	Offline
Last Visited	‎11-15-2024 07:13 AM

Member Since	‎02-19-2023 10:13 PM
Last Visited	‎11-15-2024 07:13 AM
Posts	10

Cloudera Community

Re: How to update username and password in Apache ...

Re: How to Split an array in a json file into mult...

Re: NiFi - Trigger a Processor once after the Queu...

Re: Update the Contents of FlowFile by using Updat...

Re: Remove first few lines in a text/csv flowfile ...

Re: How to update username and password in Apache ...