About bbende

bbende · ‎03-02-2018

I've answered this on stackoverflow... https://stackoverflow.com/questions/49059136/nifi-java-lang-nosuchmethoderror-org-apache-hadoop-conf-configuration-reloadexi

bbende · ‎03-01-2018

Is it possible that your content does not contain any new line characters like \n or \r? I'm wondering if the way ReplaceText works it might be using new-lines and if it doesn't encounter any then it ends up loading the whole content which is not what we want.

bbende · ‎03-01-2018

Is ReplaceText configured as shown above with Line-By-Line and a 1MB buffer?

bbende · ‎03-01-2018

Ok GetFile -> ReplaceText -> PutFile should be fine, it will just take a long time 🙂 GetFile will stream file from source location to NiFI's internal repository ReplaceText will read line-by-line from content repo, and write line-by-line back to content repo PutFile will stream from content repo to local disk

bbende · ‎03-01-2018

You don't necessarily need a heap larger than the file unless you are using a processor that reads the entire file into memory, which generally most processors should not do unless absolutely necessary, and if they do then they should document it. In your approach of "list-->fetch-->splittext-->replacetext-->mergecontent" the issue is that you are splitting a single flow file into millions of flow files, and even though the content of all these flow files won't be in memory, its still millions of Java objects on the heap. Whenever possible you should avoid this splitting approach. You should be using the "record" processors to manipulate the data in place and keep your 22GB as a single flow file. I don't know what you actually need to do to each record so I can't say exactly, but most likely after your fetch processor you just need an UpdateRecord processor that would stream 1 record in, update a field, and stream the record out, so it would never load the entire content into memory, and would never create millions of flow files.

bbende · ‎02-28-2018

There is also a GrokReader for record processors, and the additional details docs of that has the default pattern's it uses. See the end of this page: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-record-serialization-services-nar/1.5.0/org.apache.nifi.grok.GrokReader/additionalDetails.html

bbende · ‎02-21-2018

Have you added any additional JARs to NiFI's lib directory? This looks like something is on the classpath that shouldn't be.

bbende · ‎02-21-2018

In authorizers.xml you have "Initial User Identity 1" and "Initial User Identity 2" for your two node identities, you need to add another one for your initial admin. You may need to delete users.xml and authorizations.xml before trying again, in case they are already created in a bad state.

bbende · ‎02-12-2018

Ok, how about ExecuteStreamCommand which accepts incoming flow files?

bbende · ‎02-11-2018

As far as I know, none of the HTTP processors in NiFi support Kerberos authentication, so I don't think you'll be able to do the first idea. For the second idea, you should be able to use the ExecuteProcess processor, the command arguments property supports expression language.

Online	Offline
Last Visited	‎09-10-2020 01:23 PM

Member Since	‎09-29-2015 04:02 PM
Last Visited	‎09-10-2020 01:23 PM
Posts	871
Kudos received	709

Cloudera Community

Re: Using nifi registry in a nifi cluster.

Re: Is there a way to enable a stateful status upd...

Re: Automated Start/Stop of a NiFi Processor

Re: PublishKafkaRecord_0_10 1.2.0.3.0.1.1-5 Error:...

Re: how to configure mergecontent processor

Re: Nifi throw error: java.lang.NoSuchMethodError:...

Re: NiFi JVM settings for large files.

Re: NiFi JVM settings for large files.

Re: NiFi JVM settings for large files.

Re: NiFi JVM settings for large files.

Re: Nifi ExtractGrok Processor

Re: [NiFi] Secured/SSL Environment - Remote Proces...

Re: LDAP authorization issue with NiFi cluster

Re: Nifi invokehttp processor to make web API call...

Re: Nifi invokehttp processor to make web API call...