About bbende

bbende · ‎05-12-2016

Processor's can be renamed on the first tab of the processor's configuration window. You could name each processor with the table name to easily see it when looking at the flow.

bbende · ‎05-03-2016

HDF (NiFi) moves data around as flow files, and each flow file is made up of metadata attributes and content, where the content is just bytes. There is no internal data format where NiFi knows about "fields", so it can't provide a generic way to encrypt fields. It provides out of the box processors called EncryptContent and DecryptContent which can encrypt and decrypt the entire content of a FlowFile. If you have a know data format like CSV, JSON, etc, and want to encrypt individual fields with in that content, it would likely require a custom processor to interpret that format and apply the encryption. It may be possible to do this with the ExecuteScript processor, but a custom Java processor would definitely be possible.

bbende · ‎04-29-2016

Apache NiFi provides several processors for receiving data over a network connection, such as ListenTCP, ListenSyslog, and ListenUDP. These processors are implemented using a common approach, and a shared framework for receiving and processing messages. When one of these processors is scheduled to run, a separate thread is started to listen for incoming network connections. When a connection is accepted, another thread is created to read from that connection, we will call this the channel reader (the original thread continues listening for additional connections). The channel reader starts reading data from the connection as fast as possible, placing each received message into a blocking queue. The blocking queue is a shared reference with the processor. Each time the processor executes it will poll the queue for one or more messages and write them to a flow file. The overall setup looks like the following: There are several competing activities taking place: The data producer is writing data to the socket, which is being held in the socket’s buffer, while at the same time the channel reader is reading the data held in that buffer. If the data is written faster than it is read, then the buffer will eventually fill up, and incoming data will be dropped. The channel reader is pushing messages into the message queue, while the processor concurrently pulls messages off the queue. If messages are pushed into the queue faster than the processor can pull them out, then the queue will reach maximum capacity, causing the channel reader to block while waiting for space in the queue. If the channel reader is blocking, it is not reading from the connection, which means the socket buffer can start filling up and cause data to be dropped. In order to achieve optimal performance, these processors expose several properties to tune these competing activities: Max Batch Size – This property controls how many messages will be pulled from the message queue and written to a single flow file during one execution of the processor. The default value is 1, which means a message-per–flow-file. A single message per flow file is useful for downstream parsing and routing, but provides the worst performance scenario. Increasing the batch size will drastically reduce the amount I/O operations performed, and will likely provide the greatest overall performance improvement. Max Size of Message Queue – This property controls the size of the blocking queue used between the channel reader and the processor. In cases where large bursts of data are expected, and enough memory is available to the JVM, this value can be increased to allow more room for internal buffering. It is important to consider that this queue is held in memory, and setting this size too large could adversely affect the overall memory of the NiFi instance, thus causing problems for the overall flow and not just this processor. Max Size of Socket Buffer – This property attempts to tell the operating system to set the size of the socket buffer. Increasing this value provides additional breathing room for bursts of data, or momentary pauses while reading data. In some cases, configuration may need to be done at the operating system level in order for the socket buffer size to take affect. The processor will provide a warning if it was not able to set the buffer size to the desired value. Concurrent Tasks – This is a property on the scheduling tab of every processor. In this case, increasing the concurrent tasks means more quickly pulling messages off the message queue, and thus more quickly freeing up room for the channel reader to push more messages in. When using a larger Max Batch Size, each concurrent task is able to batch together that many messages in a single execution. Adjusting the above values appropriately should provide the ability to tune for high through put scenarios. A good approach would be to start by increasing the Max Batch Size from 1 to 1000, and then observe performance. From there, a slight increase to the Max Size of Socket Buffer, and increasing Concurrent Tasks from 1 to 2, should provide additional improvements.

bbende · ‎04-28-2016

It does not need to be on the source server, it is listening on the port in the processor for incoming connections. A syslog server can be configured to forward messages to the host & port that the processor is listening on. Once it accepts connections it will be reading as fast as possible. The main properties to consider customizing are the Max Batch Size, Max Size of Socket Buffer Size, and Max Size of Message Queue Size, but this depends a lot on your environment and the amount of data.

bbende · ‎04-28-2016

I don't think anyone has done anything yet, but I've wanted to for a while. It should be straight forward to wrap the Jedis client in some processors.

bbende · ‎04-20-2016

@Anoop Nair adding FetchHBase to the nifi-hbase-bundle sounds like the right approach. The HBase processors rely on a Controller Service (HBaseClientService) to interact with HBase, which hides the HBase client from the processors. There is currently a scan method in the HBaseClientService: public void scan(final String tableName, final Collection<Column> columns, final String filterExpression, final long minTime, final ResultHandler handler) throws IOException { Technically you could probably use this with a filter expression limiting to the row id you want, but it would be a lot more efficient to create a new method in that interface that takes the table, row id, and columns to return.

bbende · ‎04-20-2016

@Anoop Nair if you do get something working we would be happy to help you contribute it back to NiFi if you would like to, although perfectly fine to maintain it outside of NiFi as well. Let us know if we can help.

bbende · ‎04-19-2016

Or are you getting the row id, col fam, col qual coming from kafka, and you want to fetch a single cell?

bbende · ‎04-19-2016

I created this JIRA: https://issues.apache.org/jira/browse/NIFI-1784 How would you want the data for the row to be represented in a NiFi flow file? JSON document where the key/value pairs are column qualifiers / values?

bbende · ‎04-18-2016

There is a GetHBase processor described here: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.hbase.GetHBase/index.html It is intended to do an incremental retrieval from an HBase table. Does this work for your use-case, or are you looking for something different?

Online	Offline
Last Visited	‎09-10-2020 01:23 PM

Member Since	‎09-29-2015 04:02 PM
Last Visited	‎09-10-2020 01:23 PM
Posts	871
Kudos received	709

Cloudera Community

Re: Using nifi registry in a nifi cluster.

Re: Is there a way to enable a stateful status upd...

Re: Automated Start/Stop of a NiFi Processor

Re: PublishKafkaRecord_0_10 1.2.0.3.0.1.1-5 Error:...

Re: how to configure mergecontent processor

Re: What is the best way to organize many NiFi Pro...

Re: How do you encrypt specific data fields in HDF...

Optimizing Performance of Apache NiFi's Network Li...

Re: Does NiFi listenSysLog processor require nifi ...

Re: Redis with NIFI

Re: Does nifi (HDF 1.2) have a processor to read r...

Re: Does nifi (HDF 1.2) have a processor to read r...

Re: Does nifi (HDF 1.2) have a processor to read r...

Re: Does nifi (HDF 1.2) have a processor to read r...

Re: Does nifi (HDF 1.2) have a processor to read r...