About bbende

bbende · ‎05-26-2017

In the top right corner of the NiFi UI there should be a search icon, click in there and enter the id you are looking for, then it should show a list of results and you can click the component and it will take you right to it.

bbende · ‎05-26-2017

Look for Kerberos Principal and Kerberos Keytab on these pages: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hive-nar/1.2.0/org.apache.nifi.dbcp.hive.HiveConnectionPool/index.html https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hive-nar/1.2.0/org.apache.nifi.processors.hive.PutHiveStreaming/index.html

bbende · ‎05-26-2017

There are two different things... - HiveConnectionPool for use with the PutHiveQL/SelectHiveQL processors which go through the JDBC interface - PutHiveStreaming for ingesting through Hive streaming You mentioned Hive Streaming, but then your log shows HiveConnectionPool which is not for Hive streaming. Either way, HiveConnectionPool and PutHiveStreaming both have properties in their config for Kerberos Principal and Kerberos Keytab which need to be filled in, and I believe when using HiveConnectionPool the JDBC connection string also needs the principal specified. The stuff nifi.kerberos.service.keytab.location and nifi.kerberos.service.principal in nifi.properties is not used by processors, only for framework level things where NiFi needs to talk to another service.

bbende · ‎05-26-2017

Also wanted to ask, in #2 when you said you restarted each instance at a time, was that an upgrade from a previous version of NiFi to 1.2.0? The reason I ask is because currently you need to stop all nodes, upgrade the lib directories, and then restart. You won't be able to leave one node running 1.1.x and then bring up another node running 1.2.0.

bbende · ‎05-26-2017

com.datalake.processors.SQLServerCDCProcessor from default:unknown:unversioned is not known to this NiFi instance. This means that when you restarted the node, it connected to the cluster and the received the flow that the cluster was running, and the flow the cluster was running had contained the processor above (SQLServerCDCProcessor) but the current node does not have that processor available. Can you verify that the NAR containing SQLServerCDCProcessor is in the lib directory of both instances, and that it is the exact same NAR?

bbende · ‎05-25-2017

As Matt pointed out, in order to make use of 100 concurrent tasks on a processor, you will need to increase Maximum Timer Driver Thread Count over 100. Also, as Matt pointed out, this would mean on each node you have this many threads available. As far as general performance... the performance of a single request/response with Jetty depends on what is being done in the request/response. We can't just say "Jetty can process thousands of records in seconds" unless we know what is being done with those records in Jetty. If you deployed a WAR with a servlet that immediately returned 200, that performance would be a lot different than a servlet that had to take the incoming request and write it to a database, an external system, or disk. With HandleHttpRequest/Response, each request becomes a flow file which means updates to the flow file repository and content repository, which means disk I/O, and then transferring those flow files to the next processor which reads them which means more disk I/O. I'm not saying this can't be fast, but there is more happening there than just a servlet that returns 200 immediately. What I was getting with the last question was that if you have 100 concurrent tasks on HandleHttpRequest and 1 concurrent task on HandleHttpResponse, eventually the response part will become the bottle neck.

bbende · ‎05-25-2017

Created this: https://issues.apache.org/jira/browse/NIFI-3979

bbende · ‎05-25-2017

Is there any pattern about the file that is missed? Is it always the latest modification time of all the files in the directory? You can turn on DEBUG logging for org.apache.nifi.processors.hadoop.ListHDFS by editing logback.xml and you should see some more information that might be helpful.

bbende · ‎05-25-2017

So I'm assuming your flow is HandleHttpRequest -> HandleHttpResponse? Is the queue between them filling up and hitting back-pressure, or does it look like HandleHttpRequest is not processing requests fast enough. For background, the way these processors work is the following... When HandleHttpRequest is started, it creates an embedded Jetty server with the default thread pool which I believe has 200 threads by default. When you send messages, the embedded Jetty server is handling them with the thread pool mentioned above, and placing them into an internal queue The internal queue size is based on the property in the processor 'Container Queue Size' When the processor executes, it is polling the queue to get one of the requests and creates a flow file with the content of the request and transfers it to the next processor The processor is executed by the number of concurrent tasks, in your case 100 Then HandleHttpResponse will send the response back to the client Here a couple of things to consider... From your screenshots it shows you have 100 concurrent tasks set on HandleHttpRequest, this seems really high. Do you have servers that are powerful enough to support this? Have you increased the overall timer driven thread pool in NiFi to account for this? From your screenshot it also shows that you are in a cluster, are you load balancing your requests across the NiFi nodes in the cluster? or are you sending all of them to only one node? What do you have concurrent tasks set to on HandleHttpResponse?

bbende · ‎05-10-2017

Correct, it wouldn't happen from regular restarts. I don't know exactly, but its something like /usr/hdp/current/phoenix

Online	Offline
Last Visited	‎09-10-2020 01:23 PM

Member Since	‎09-29-2015 04:02 PM
Last Visited	‎09-10-2020 01:23 PM
Posts	871
Kudos received	709

Cloudera Community

Re: Using nifi registry in a nifi cluster.

Re: Is there a way to enable a stateful status upd...

Re: Automated Start/Stop of a NiFi Processor

Re: PublishKafkaRecord_0_10 1.2.0.3.0.1.1-5 Error:...

Re: how to configure mergecontent processor

Re: NiFi Kerberos Integration

Re: NiFi Kerberos Integration

Re: NiFi Kerberos Integration

Re: New, custom processor causing NiFi to fail

Re: New, custom processor causing NiFi to fail

Re: HandleHTTPRequest: Configure to process millio...

Re: Nifi ListHDFS missing 1 file per poll.

Re: Nifi ListHDFS missing 1 file per poll.

Re: HandleHTTPRequest: Configure to process millio...

Re: NiFi HBase_1_1_2_ClientService setup issue - s...