Member since
09-29-2015
871
Posts
723
Kudos Received
255
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4258 | 12-03-2018 02:26 PM | |
| 3199 | 10-16-2018 01:37 PM | |
| 4305 | 10-03-2018 06:34 PM | |
| 3164 | 09-05-2018 07:44 PM | |
| 2423 | 09-05-2018 07:31 PM |
10-19-2016
05:53 PM
3 Kudos
Option 1 seems fine if you are able to open the firewall port. In option 2, rather than write a Java process, you could run a NiFi inside the secure cluster using ConsumeKafka to consume the messages and then use the appropriate follow on processors (PutHDFS, PutHiveQL, PutHBaseJson, etc). So you still use Kafka as the gateway into the cluster, but don't have to write any custom code.
... View more
10-19-2016
01:55 PM
If you use timer scheduling it will still execute right away, so if you set 30 seconds it will run right away then wait 30 seconds before running again. Can you provide all of the configuration you entered for the processor? You would need to provide the "Maximum-Value Columns" in order for it to track where it left off and pick up there on next execution.
... View more
10-19-2016
01:24 PM
1 Kudo
I suspect it is something more on the Hive side of things, which is out of my domain. Increasing the concurrent tasks on the PutHiveQL processor is the appropriate approach on the NiFi side, generally somewhere between 1-5 concurrent tasks is usually enough, but the concurrent tasks can only work as fast as whatever they are calling. If all 10 of your threads go to make a call to the Hive JDBC driver, and 2 of them are doing stuff, and 8 are blocking because of something in Hive, then there isn't much NiFi can do.
... View more
10-19-2016
12:57 PM
3 Kudos
PutHiveQL is not really intended to be used for "massive ingest" purposes since it has to go through the Hive JDBC driver which has a lot of overhead for a single insert. PutHiveStreaming would probably be what you want to use, or just writing data to a directory in HDFS (using PutHDFS) and creating a Hive external table on top of it.
... View more
10-19-2016
12:25 PM
2 Kudos
How did you schedule QueryDatabaseTable? If you didn't change anything on the scheduling tab of the processor, then the run schedule is 0 seconds which means as fast as possible. You most likely want to run this on some kind of timer or cron scheduling.
... View more
10-18-2016
02:21 AM
Yes, ListenSyslog is intended to be listening for incoming network connections from remote syslog servers that are forwarding messages over TCP or UDP. You just tell rsyslog to the host where NiFi is running and the port ListenSyslog is listening on. Example with rsyslog: https://blogs.apache.org/nifi/entry/storing_syslog_events_in_hbase
... View more
10-17-2016
05:26 PM
3 Kudos
I don't think you can store the index on local and HDFS at the same time. The location of the index is based on which directory factory is being used, and in the case of HDFS you would be using the HDFS directory factory which would only be storing the index on HDFS. See the following: https://cwiki.apache.org/confluence/display/solr/DataDir+and+DirectoryFactory+in+SolrConfig https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
... View more
10-17-2016
05:22 PM
If you added the correct dependencies in your processor's pom file then they should be getting included in the generated NAR, as Andrew mentioned above. Please verify you have the correct project structure described here: https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions#MavenProjectsforExtensions-ProcessorProjects There should be a Maven project with two sub-modules, one for the processors JAR where you would put your dependencies in the pom file, and another for building the NAR.
... View more
10-12-2016
01:25 PM
It is a resource identifier for a policy that should be auto-generated for each node that you put in "Node Identities". Normally you would go to the UI and go to the global policies from the top-right menu, but since you can't get into the UI you can check the users.xml and authorizations.xml. There should be a user for each cluster node in users.xml, and there should be a policy in authorizations.xml for /proxy that all the cluster node users belong to.
... View more
10-12-2016
01:13 PM
1 Kudo
Was the DN in the access denied message one of the DNs in the Node Identities? Every node in the cluster needs to have READ access on /proxy which is defined through the global policies in the top right menu. The Node Identities should get this policy automatically, but you can double check if your node is in the list for /proxy, and if not add it.
... View more