About bbende

bbende · ‎08-03-2016

ConsumeKafka and PublishKafka use the Kafka 0.9 client library. GetKafka and PutKafka use the Kafka 0.8 client library.

bbende · ‎07-28-2016

The flow you see in the UI comes from the conf directory flow.xml.gz, unrelated to the H2 database. If you installed two NiFi instances you should have two completely separate directories like /opt/nifi1 and /opt/nifi2, each with their own conf sub-directories, each with their own flow.xml.gz. If you had an existing NiFi instance with a flow, and then copied that directory to nifi2, it would start with the same flow, but from there if you made changes to either they should be separate.

bbende · ‎07-28-2016

Glad to hear it is working. As far as the broker, NiFi is not doing anything special, it just takes the value entered in the processor config and passes it to Kafka as the property ProducerConfig.BOOTSTRAP_SERVERS_CONFIG (which is "bootstrap.servers"). So it really comes down to what Kafka does with this list. I would think it can be any hostname or IP that can be resolved.

bbende · ‎07-28-2016

You can attach the debugger from your IDE to NiFI... In NiF's conf directory in bootstrap.conf there is a line commented out like this: #java.arg.debug=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8000 If you uncomment that line and restart NiFi, the Java process will be listening for a debug connection on port 8000. If you want the process to wait for a connection before starting, you can also set suspend=y. Connecting your IDE debugger to the Java process will be specific to which IDE you are using. You can also add addition logging statements in your processor code using the logger return from getLogger() to see how far it is getting. The log-levels are controlled through the logback.xml file in the conf directory. The default level for processors is WARN: <loggername="org.apache.nifi.processors"level="WARN"/>

bbende · ‎07-27-2016

I just realized something... GetKafka and PutKafka use the kafka-client 0.8.2, and there are newer processors in 0.7.0 called ConsumeKafka and PublishKafka which use kafka-client 0.9.0.1. Since you are using Kafka 0.9 I think we should be using ConsumeKafka here. Lets see if that works any better, and sorry for the confusion.

bbende · ‎07-27-2016

Hi Stephanie, Were you able to try creating another GetKafka processor for the same topic on your NiFi graph to see if it experiences the same problem? The reason I wanted to try this was because each GetKafka processor has a property for Client Name which is specific to the processor's id, it ends up being something like "NiFi-<uuid>". This id will end up being stored in ZooKeeper somewhere to identify the consumer. So I wanted to see if creating a new GetKafka processor, and thus a new client id, would get the same problem or not.

bbende · ‎07-27-2016

Glad we got past the first problem... out of curiosity, if you created another new GetKafka processor pointing at the same Kafka topic, does it also get the same error? I was reading this old thread that seemed related, and it hinted that some kind of bad state might be stuck in ZooKeeper: http://apache-nifi-developer-list.39713.n7.nabble.com/GetKafka-blowing-up-with-assertion-error-in-Kafka-client-code-td9098.html

bbende · ‎07-27-2016

What do you want to do with the hashtags? If you want to get a new flow file for each hashtag you can use the SplitJson processor with a JSONPath value of $.twitter.hashtags

bbende · ‎07-27-2016

As you mentioned, NiFi does offer many capabilities that can be used to perform ETL functionality. In general though, NiFi is more of a general purpose dataflow tool. NiFi clusters usually scale to dozens of nodes, so if your "transform" needs to run on 100s (or 1000s) of nodes in parallel, then it may be better to use NiFi to bring data to another processing framework like Storm, Spark, etc. Similar for the "extract" part... NiFi has capabilities to extract from relational databases with ExecuteSQL and QueryDatabaseTable processors which can solve many uses case, and for more extreme use-cases something like sqoop can leverage a much larger Hadoop cluster to perform the extraction. So as always there is no single correct answer and it depends a lot on the use-case.

bbende · ‎07-27-2016

It looks like there are two different connections being attempted here... the first one is to 52.90.171.224:2181 and then the one where the error is coming from is to localhost:2181. Can you confirm if 52.90.171.224 is a remote server, or is it the same IP of the machine where NiFi is running? Also, are you running a NiFi cluster or single instance? and if running a cluster, are you running an embedded ZooKeeper for NiFi's state management? Just trying to see if ZooKeeper is being used for anything else here.

Online	Offline
Last Visited	‎09-10-2020 01:23 PM

Member Since	‎09-29-2015 04:02 PM
Last Visited	‎09-10-2020 01:23 PM
Posts	871
Kudos received	709

Cloudera Community

Re: Using nifi registry in a nifi cluster.

Re: Is there a way to enable a stateful status upd...

Re: Automated Start/Stop of a NiFi Processor

Re: PublishKafkaRecord_0_10 1.2.0.3.0.1.1-5 Error:...

Re: how to configure mergecontent processor

Re: Apache NIFI - What is the difference between C...

Re: Why multiple nifi instances on the same host s...

Re: NiFi GetKafka ZooKeeper Connection Error

Re: Is there a way to debug a custom NiFi process ...

Re: NiFi GetKafka ZooKeeper Connection Error

Re: NiFi GetKafka ZooKeeper Connection Error

Re: NiFi GetKafka ZooKeeper Connection Error

Re: Getting all hashtags in a single array

Re: Nifi ETL: Principles or decision points for ET...

Re: NiFi GetKafka ZooKeeper Connection Error