About bbende

MattWho · ‎05-07-2018

@srinivas p *** Forum tip: Avoid responding to existing answers with a new answer. Instead use comments to correspond within a single answer. - That being said, your environment is very different from environment in this original question. Far fewer nodes. Are you running same version of HDF/NiFi? - - I actually recommend starting a new question with your environment specific details. You'll get my traction answer wise that way. - Thanks, Matt

elloyd · ‎06-22-2017

Awesome thanks again !

bbende · ‎06-23-2017

@Alvin Jin Also, I know you already implemented a custom service, but there is also some work here by one of the Apache NiFi committers: https://github.com/apache/nifi/pull/1938

TimothySpann · ‎06-14-2017

Apache NiFi 1.2, apache download. I'll try in Apache NiFi 1.3.

BrianMorrisCLR · ‎06-14-2017

I'm not sure how the to identify the Kafka broker version easily. I did find the the file kafka_2.10-0.10.1.2.1.0.0-165.jar.asc in the /libs folder where Kafka is installed so I am assuming I am running Kafka 0.10.1. I did get both the ConsumeKafka and ConsumeKafka_0_10 connectors to work. Thanks. Now off to figure out why the PutHiveStreaming doesn't work, but that will be for a different post.

elloyd · ‎06-13-2017

We figured it out Bryan. We didn't have a message demarcator set and once we set it, the error went away! Thank you!

cstanca · ‎06-07-2017

@Matt Burgess Thank you so much.

bbende · ‎06-06-2017

The session provides methods to read and write to the flow file content. If you are reading only then session.read with an InputStreamCallback will give you an InputStream to the flow file content If you are writing only then session.write with an OutputStreamCallback will give you an OutputStream to the flow file content If you are reading and writing at the same time then a StreamCallback will give access to the both an InputStream and OutputStream In your case, if you are just looking to extract a value then you likely need an InputStreamCallback and you would use the InputStream to read the content and parse it appropriately for your data. You can look at examples in the existing processors: https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ExtractText.java#L313-L318 Keep in mind, the above example reads the whole content of the flow file into memory which can be dangerous when there are very large flow files, so whenever possible it is best to process the content in chunks.

MattWho · ‎06-01-2017

@Alvaro Dominguez The primary node could change at anytime. You could use postHTTP and listenHTTP processor to route FlowFiles from multiple nodes to a single node. My concern would be heap usage to merge (zip) 160K FlowFiles on a single NiFi node. The FlowFile metadata for all those FlowFiles being zipped would be help in heap memory until the zip is complete. Any objection to having a zip of zips? In other words you could still create 4 unique zip files (1 per node each with unique filename), then send these zipped files to one node to be zipped once more in to a new zip with the single name you want written into HDFS. Thanks, Matt

bbende · ‎05-25-2017

As Matt pointed out, in order to make use of 100 concurrent tasks on a processor, you will need to increase Maximum Timer Driver Thread Count over 100. Also, as Matt pointed out, this would mean on each node you have this many threads available. As far as general performance... the performance of a single request/response with Jetty depends on what is being done in the request/response. We can't just say "Jetty can process thousands of records in seconds" unless we know what is being done with those records in Jetty. If you deployed a WAR with a servlet that immediately returned 200, that performance would be a lot different than a servlet that had to take the incoming request and write it to a database, an external system, or disk. With HandleHttpRequest/Response, each request becomes a flow file which means updates to the flow file repository and content repository, which means disk I/O, and then transferring those flow files to the next processor which reads them which means more disk I/O. I'm not saying this can't be fast, but there is more happening there than just a servlet that returns 200 immediately. What I was getting with the last question was that if you have 100 concurrent tasks on HandleHttpRequest and 1 concurrent task on HandleHttpResponse, eventually the response part will become the bottle neck.

Online	Offline
Last Visited	‎09-10-2020 01:23 PM

Member Since	‎09-29-2015 04:02 PM
Last Visited	‎09-10-2020 01:23 PM
Posts	871
Kudos received	709

Cloudera Community

Re: Using nifi registry in a nifi cluster.

Re: Is there a way to enable a stateful status upd...

Re: Automated Start/Stop of a NiFi Processor

Re: PublishKafkaRecord_0_10 1.2.0.3.0.1.1-5 Error:...

Re: how to configure mergecontent processor

Re: Read Time out issue with NiFi cluster NiFi ver...

Re: Data is becoming stuck after Input Port in Nif...

Re: How to inject Custom Confluent Schema Registry...

Re: AvroRecordSetWriter Fails on Configuration, No...

Re: Nifi Version 1.1.0.2.1.0.0-165 1 - Cannot get ...

Re: Message size exceeded in Publish_Kafka_0_10

Re: How to Transform an Avro to a Super-Schema Avr...

Re: How do we get the fields of the FlowFile in ni...

Re: NiFi Cluster with PutHDFS - append error

Re: HandleHTTPRequest: Configure to process millio...