About MattWho

MattWho · ‎06-01-2017

@Alvaro Dominguez The primary node could change at anytime. You could use postHTTP and listenHTTP processor to route FlowFiles from multiple nodes to a single node. My concern would be heap usage to merge (zip) 160K FlowFiles on a single NiFi node. The FlowFile metadata for all those FlowFiles being zipped would be help in heap memory until the zip is complete. Any objection to having a zip of zips? In other words you could still create 4 unique zip files (1 per node each with unique filename), then send these zipped files to one node to be zipped once more in to a new zip with the single name you want written into HDFS. Thanks, Matt

mpayne · ‎06-13-2017

@Oleksandr Solomko have you changed the default value of the "nifi.queue.swap.threshold" property in nifi.properties? If so, you may be running into NIFI-3897.

MattWho · ‎06-01-2017

@Simran Kaur I had a feeling your issue was related to a missing config. Glad to hear you got it working. If this answer addressed your original question, please mark it as accepted. As far as your other question goes, I see you already started a new question (https://community.hortonworks.com/questions/105720/nifi-stream-using-listenhttp-processor-creates-too.html). That is the correct approach in this forum, we want to avoid asking unrelated questions in the same post. I will have a look at that post as well. Thank you, Matt

joshua_adeleke · ‎06-02-2017

thanks @Matt Clarke. Will downgrade asap.

MattWho · ‎11-16-2018

Article content updated to reflect new provenance implementation recommendation and change in JVM Garbage Collector recommendation.

bbende · ‎05-25-2017

As Matt pointed out, in order to make use of 100 concurrent tasks on a processor, you will need to increase Maximum Timer Driver Thread Count over 100. Also, as Matt pointed out, this would mean on each node you have this many threads available. As far as general performance... the performance of a single request/response with Jetty depends on what is being done in the request/response. We can't just say "Jetty can process thousands of records in seconds" unless we know what is being done with those records in Jetty. If you deployed a WAR with a servlet that immediately returned 200, that performance would be a lot different than a servlet that had to take the incoming request and write it to a database, an external system, or disk. With HandleHttpRequest/Response, each request becomes a flow file which means updates to the flow file repository and content repository, which means disk I/O, and then transferring those flow files to the next processor which reads them which means more disk I/O. I'm not saying this can't be fast, but there is more happening there than just a servlet that returns 200 immediately. What I was getting with the last question was that if you have 100 concurrent tasks on HandleHttpRequest and 1 concurrent task on HandleHttpResponse, eventually the response part will become the bottle neck.

egarelnabi · ‎05-25-2017

Please take a look at @Matt Clarke's response above on how to extract csv files only. It is the most straight forward way.

MattWho · ‎05-24-2017

@Bhushan Babar Glad i was able to help resolve your issue. Could you please click "accept" the answer i provided to close out this question in the community? Thank you, Matt

MattWho · ‎05-24-2017

@regie canada The extractText processor creates FlowFile attributes from the extracted text. NiFi has an AttributesToJSON processor you can use to generate JSON form these created attributes. For new questions, please open a new question. It makes it easier for community users to search for answers. Thanks, Matt

simon_jespersen · ‎05-19-2017

@Matt Clarke Thank you very very much, your answer was very useful for me

Online	Offline
Last Visited	‎11-09-2025 03:23 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎11-09-2025 03:23 PM
Posts	3,387
Kudos received	1613

Cloudera Community

Re: How to achieve inheritence within Parameter Co...

Re: using nifi as a kafka streaming- real-time str...

Re: using nifi as a kafka streaming- real-time str...

Re: Nifi Registry and LDAP

Re: NiFi logs not rolling over on Windows

Re: NiFi Cluster with PutHDFS - append error

Re: Unable to clear Nifi Queue

Re: Nifi putHDFS processor not working although pu...

Re: Unable to configure HBase JDBC driver on HDF 2...

Re: HDF/NIFI Best practices for setting up a high ...

Re: HandleHTTPRequest: Configure to process millio...

Re: combine two files in nifi

Re: I want to create a new user entry with multipl...

Re: Nifi Extraction

Re: how to run processor once on many flowfiles