About bbende

bbende · ‎10-19-2016

Option 1 seems fine if you are able to open the firewall port. In option 2, rather than write a Java process, you could run a NiFi inside the secure cluster using ConsumeKafka to consume the messages and then use the appropriate follow on processors (PutHDFS, PutHiveQL, PutHBaseJson, etc). So you still use Kafka as the gateway into the cluster, but don't have to write any custom code.

bbende · ‎10-19-2016

If you use timer scheduling it will still execute right away, so if you set 30 seconds it will run right away then wait 30 seconds before running again. Can you provide all of the configuration you entered for the processor? You would need to provide the "Maximum-Value Columns" in order for it to track where it left off and pick up there on next execution.

bbende · ‎10-19-2016

I suspect it is something more on the Hive side of things, which is out of my domain. Increasing the concurrent tasks on the PutHiveQL processor is the appropriate approach on the NiFi side, generally somewhere between 1-5 concurrent tasks is usually enough, but the concurrent tasks can only work as fast as whatever they are calling. If all 10 of your threads go to make a call to the Hive JDBC driver, and 2 of them are doing stuff, and 8 are blocking because of something in Hive, then there isn't much NiFi can do.

bbende · ‎10-19-2016

PutHiveQL is not really intended to be used for "massive ingest" purposes since it has to go through the Hive JDBC driver which has a lot of overhead for a single insert. PutHiveStreaming would probably be what you want to use, or just writing data to a directory in HDFS (using PutHDFS) and creating a Hive external table on top of it.

bbende · ‎10-19-2016

How did you schedule QueryDatabaseTable? If you didn't change anything on the scheduling tab of the processor, then the run schedule is 0 seconds which means as fast as possible. You most likely want to run this on some kind of timer or cron scheduling.

bbende · ‎10-18-2016

Yes, ListenSyslog is intended to be listening for incoming network connections from remote syslog servers that are forwarding messages over TCP or UDP. You just tell rsyslog to the host where NiFi is running and the port ListenSyslog is listening on. Example with rsyslog: https://blogs.apache.org/nifi/entry/storing_syslog_events_in_hbase

bbende · ‎10-17-2016

I don't think you can store the index on local and HDFS at the same time. The location of the index is based on which directory factory is being used, and in the case of HDFS you would be using the HDFS directory factory which would only be storing the index on HDFS. See the following: https://cwiki.apache.org/confluence/display/solr/DataDir+and+DirectoryFactory+in+SolrConfig https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS

bbende · ‎10-17-2016

If you added the correct dependencies in your processor's pom file then they should be getting included in the generated NAR, as Andrew mentioned above. Please verify you have the correct project structure described here: https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions#MavenProjectsforExtensions-ProcessorProjects There should be a Maven project with two sub-modules, one for the processors JAR where you would put your dependencies in the pom file, and another for building the NAR.

bbende · ‎10-12-2016

It is a resource identifier for a policy that should be auto-generated for each node that you put in "Node Identities". Normally you would go to the UI and go to the global policies from the top-right menu, but since you can't get into the UI you can check the users.xml and authorizations.xml. There should be a user for each cluster node in users.xml, and there should be a policy in authorizations.xml for /proxy that all the cluster node users belong to.

bbende · ‎10-12-2016

Was the DN in the access denied message one of the DNs in the Node Identities? Every node in the cluster needs to have READ access on /proxy which is defined through the global policies in the top right menu. The Node Identities should get this policy automatically, but you can double check if your node is in the list for /proxy, and if not add it.

Online	Offline
Last Visited	‎09-10-2020 01:23 PM

Member Since	‎09-29-2015 04:02 PM
Last Visited	‎09-10-2020 01:23 PM
Posts	871
Kudos received	709

Cloudera Community

Re: Using nifi registry in a nifi cluster.

Re: Is there a way to enable a stateful status upd...

Re: Automated Start/Stop of a NiFi Processor

Re: PublishKafkaRecord_0_10 1.2.0.3.0.1.1-5 Error:...

Re: how to configure mergecontent processor

Re: Running NiFi outside a secured Hadoop Cluster

Re: How to stop QueryDatabaseTable after processin...

Re: Nifi & Hive are not running map jobs in parall...

Re: Nifi & Hive are not running map jobs in parall...

Re: How to stop QueryDatabaseTable after processin...

Re: ListenSysLog

Re: Can we store index in local file system as wel...

Re: where to add external jars in nifi setup.

Re: Getting untrusted proxy message while trying t...

Re: Getting untrusted proxy message while trying t...