Member since
09-29-2015
871
Posts
723
Kudos Received
255
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4168 | 12-03-2018 02:26 PM | |
| 3135 | 10-16-2018 01:37 PM | |
| 4253 | 10-03-2018 06:34 PM | |
| 3090 | 09-05-2018 07:44 PM | |
| 2353 | 09-05-2018 07:31 PM |
05-07-2018
04:48 PM
@srinivas p *** Forum tip: Avoid responding to existing answers with a new answer. Instead use comments to correspond within a single answer. - That being said, your environment is very different from environment in this original question. Far fewer nodes. Are you running same version of HDF/NiFi? - - I actually recommend starting a new question with your environment specific details. You'll get my traction answer wise that way. - Thanks, Matt
... View more
06-22-2017
05:26 PM
Awesome thanks again !
... View more
06-23-2017
02:11 PM
@Alvin Jin Also, I know you already implemented a custom service, but there is also some work here by one of the Apache NiFi committers: https://github.com/apache/nifi/pull/1938
... View more
06-14-2017
07:28 PM
Apache NiFi 1.2, apache download. I'll try in Apache NiFi 1.3.
... View more
06-14-2017
04:21 PM
I'm not sure how the to identify the Kafka broker version easily. I did find the the file kafka_2.10-0.10.1.2.1.0.0-165.jar.asc in the /libs folder where Kafka is installed so I am assuming I am running Kafka 0.10.1. I did get both the ConsumeKafka and ConsumeKafka_0_10 connectors to work. Thanks. Now off to figure out why the PutHiveStreaming doesn't work, but that will be for a different post.
... View more
06-13-2017
02:52 PM
We figured it out Bryan. We didn't have a message demarcator set and once we set it, the error went away! Thank you!
... View more
06-07-2017
08:48 PM
2 Kudos
@Matt Burgess Thank you so much.
... View more
06-06-2017
01:53 PM
1 Kudo
The session provides methods to read and write to the flow file content. If you are reading only then session.read with an InputStreamCallback will give you an InputStream to the flow file content If you are writing only then session.write with an OutputStreamCallback will give you an OutputStream to the flow file content If you are reading and writing at the same time then a StreamCallback will give access to the both an InputStream and OutputStream In your case, if you are just looking to extract a value then you likely need an InputStreamCallback and you would use the InputStream to read the content and parse it appropriately for your data. You can look at examples in the existing processors: https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ExtractText.java#L313-L318 Keep in mind, the above example reads the whole content of the flow file into memory which can be dangerous when there are very large flow files, so whenever possible it is best to process the content in chunks.
... View more
06-01-2017
03:23 PM
1 Kudo
@Alvaro Dominguez The primary node could change at anytime. You could use postHTTP and listenHTTP processor to route FlowFiles from multiple nodes to a single node. My concern would be heap usage to merge (zip) 160K FlowFiles on a single NiFi node. The FlowFile metadata for all those FlowFiles being zipped would be help in heap memory until the zip is complete. Any objection to having a zip of zips? In other words you could still create 4 unique zip files (1 per node each with unique filename), then send these zipped files to one node to be zipped once more in to a new zip with the single name you want written into HDFS. Thanks, Matt
... View more
05-25-2017
07:23 PM
As Matt pointed out, in order to make use of 100 concurrent tasks on a processor, you will need to increase Maximum Timer Driver Thread Count over 100. Also, as Matt pointed out, this would mean on each node you have this many threads available. As far as general performance... the performance of a single request/response with Jetty depends on what is being done in the request/response. We can't just say "Jetty can process thousands of records in seconds" unless we know what is being done with those records in Jetty. If you deployed a WAR with a servlet that immediately returned 200, that performance would be a lot different than a servlet that had to take the incoming request and write it to a database, an external system, or disk. With HandleHttpRequest/Response, each request becomes a flow file which means updates to the flow file repository and content repository, which means disk I/O, and then transferring those flow files to the next processor which reads them which means more disk I/O. I'm not saying this can't be fast, but there is more happening there than just a servlet that returns 200 immediately. What I was getting with the last question was that if you have 100 concurrent tasks on HandleHttpRequest and 1 concurrent task on HandleHttpResponse, eventually the response part will become the bottle neck.
... View more