Member since
09-29-2015
871
Posts
723
Kudos Received
255
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4274 | 12-03-2018 02:26 PM | |
| 3223 | 10-16-2018 01:37 PM | |
| 4334 | 10-03-2018 06:34 PM | |
| 3189 | 09-05-2018 07:44 PM | |
| 2437 | 09-05-2018 07:31 PM |
05-26-2017
05:54 PM
In the top right corner of the NiFi UI there should be a search icon, click in there and enter the id you are looking for, then it should show a list of results and you can click the component and it will take you right to it.
... View more
05-26-2017
05:14 PM
Look for Kerberos Principal and Kerberos Keytab on these pages: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hive-nar/1.2.0/org.apache.nifi.dbcp.hive.HiveConnectionPool/index.html https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hive-nar/1.2.0/org.apache.nifi.processors.hive.PutHiveStreaming/index.html
... View more
05-26-2017
04:02 PM
There are two different things... - HiveConnectionPool for use with the PutHiveQL/SelectHiveQL processors which go through the JDBC interface - PutHiveStreaming for ingesting through Hive streaming You mentioned Hive Streaming, but then your log shows HiveConnectionPool which is not for Hive streaming. Either way, HiveConnectionPool and PutHiveStreaming both have properties in their config for Kerberos Principal and Kerberos Keytab which need to be filled in, and I believe when using HiveConnectionPool the JDBC connection string also needs the principal specified. The stuff nifi.kerberos.service.keytab.location and nifi.kerberos.service.principal in nifi.properties is not used by processors, only for framework level things where NiFi needs to talk to another service.
... View more
05-26-2017
01:59 PM
1 Kudo
Also wanted to ask, in #2 when you said you restarted each instance at a time, was that an upgrade from a previous version of NiFi to 1.2.0? The reason I ask is because currently you need to stop all nodes, upgrade the lib directories, and then restart. You won't be able to leave one node running 1.1.x and then bring up another node running 1.2.0.
... View more
05-26-2017
01:48 PM
com.datalake.processors.SQLServerCDCProcessor from default:unknown:unversioned is not known to this NiFi instance. This means that when you restarted the node, it connected to the cluster and the received the flow that the cluster was running, and the flow the cluster was running had contained the processor above (SQLServerCDCProcessor) but the current node does not have that processor available. Can you verify that the NAR containing SQLServerCDCProcessor is in the lib directory of both instances, and that it is the exact same NAR?
... View more
05-25-2017
07:23 PM
As Matt pointed out, in order to make use of 100 concurrent tasks on a processor, you will need to increase Maximum Timer Driver Thread Count over 100. Also, as Matt pointed out, this would mean on each node you have this many threads available. As far as general performance... the performance of a single request/response with Jetty depends on what is being done in the request/response. We can't just say "Jetty can process thousands of records in seconds" unless we know what is being done with those records in Jetty. If you deployed a WAR with a servlet that immediately returned 200, that performance would be a lot different than a servlet that had to take the incoming request and write it to a database, an external system, or disk. With HandleHttpRequest/Response, each request becomes a flow file which means updates to the flow file repository and content repository, which means disk I/O, and then transferring those flow files to the next processor which reads them which means more disk I/O. I'm not saying this can't be fast, but there is more happening there than just a servlet that returns 200 immediately. What I was getting with the last question was that if you have 100 concurrent tasks on HandleHttpRequest and 1 concurrent task on HandleHttpResponse, eventually the response part will become the bottle neck.
... View more
05-25-2017
02:50 PM
Is there any pattern about the file that is missed? Is it always the latest modification time of all the files in the directory? You can turn on DEBUG logging for org.apache.nifi.processors.hadoop.ListHDFS by editing logback.xml and you should see some more information that might be helpful.
... View more
05-25-2017
02:39 PM
1 Kudo
So I'm assuming your flow is HandleHttpRequest -> HandleHttpResponse? Is the queue between them filling up and hitting back-pressure, or does it look like HandleHttpRequest is not processing requests fast enough. For background, the way these processors work is the following... When HandleHttpRequest is started, it creates an embedded Jetty server with the default thread pool which I believe has 200 threads by default. When you send messages, the embedded Jetty server is handling them with the thread pool mentioned above, and placing them into an internal queue The internal queue size is based on the property in the processor 'Container Queue Size' When the processor executes, it is polling the queue to get one of the requests and creates a flow file with the content of the request and transfers it to the next processor The processor is executed by the number of concurrent tasks, in your case 100 Then HandleHttpResponse will send the response back to the client Here a couple of things to consider...
From your screenshots it shows you have 100 concurrent tasks set on HandleHttpRequest, this seems really high. Do you have servers that are powerful enough to support this? Have you increased the overall timer driven thread pool in NiFi to account for this?
From your screenshot it also shows that you are in a cluster, are you load balancing your requests across the NiFi nodes in the cluster? or are you sending all of them to only one node?
What do you have concurrent tasks set to on HandleHttpResponse?
... View more
05-10-2017
01:42 PM
Correct, it wouldn't happen from regular restarts. I don't know exactly, but its something like /usr/hdp/current/phoenix
... View more