Member since
01-11-2016
355
Posts
230
Kudos Received
74
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
8191 | 06-19-2018 08:52 AM | |
3147 | 06-13-2018 07:54 AM | |
3575 | 06-02-2018 06:27 PM | |
3887 | 05-01-2018 12:28 PM | |
5408 | 04-24-2018 11:38 AM |
10-07-2017
02:20 PM
1 Kudo
@Vinod Chhabria I found another way to implement this using UpdateAttribute processor in NiFi 1.3. UpdateAttribute has the feature of state in this version. I didn't test in NiFi 1.2 so I can't tell if this will work in previous version. Here's the global solution: In the upper stream you have data coming from Cassandra with the new value. I don't have this so I simulated data coming in JSON format from GenerateFlow. This processor trigger each one hour to generate new key. For you it will be get data from Cassandra. What you will be doing for data coming from this stream is basically adding two attributes : type = 'update_key' and key = 'the key that you get from Cassandra'. In my case I did it with two processors UpdateAttribute and EvaluateJSONPath configured as follow: On the bottom stream you get you data to encrypt. So all what you need to do is add an attribute type = 'data' (this is optional). I do it with an UpdateAttribute. ow both these streams will go to an UpdateAttribute processor that will add an attribute encryptionkey and store it in State. We will initialize it with an empty value. As you can see, I'll add the value in the state as this attribute. See the configuration below: Now what I want to do is update this key in state only when I have a flow file from the upper stream (AKA : type = update_key). To do this, click on the Advanced setting in the left-bottom of the update attribute add the following configurations: with this condition we will be updating the encription key only once per hour when new data comes from Cassandra. After the updateattribute, you can route based on the type to drop message coming from Cassandra (update_key) and encrypt the others. Can you try this and let me know if it works?
... View more
11-02-2017
02:45 PM
Thanks Abdel!
... View more
10-24-2018
07:13 PM
@Bjorn Olsen I tried CSVRecordSetWriter but decimal to int conversion is not working. I am using NiFi 1.5
... View more
04-17-2017
12:50 PM
1 Kudo
Hi @Yahya Najjar You can use ExtractText processor to extract these field as attributes. Below a test I did. This configuration will extract your CSV fields as myfields.1, myfields.2, etc As you can see in the provenance, these informations are added to the flow files as attributes Below the complete flow
... View more
03-13-2017
11:26 AM
3 Kudos
Introduction NiFi Site to Site
(S2S) is a communication protocol used to exchange data between NiFi instances or clusters. This protocol is useful for use case where we have
geographically distributed clusters that need to communicate. Examples include:
IoT: collect data from edge
node (MiNiFi) and send them to NiFi for aggregation/storage/analysis
Connected cars : collect data
locally by city or country with a local HDF cluster, and send it back to a
global HDF cluster in core Data Center
Replication : synchronization
between two HDP clusters (on prem/cloud or Principal/DR) S2S provides several
benefits such as scalability, security, load balancing and high
availability. More information can be
found here Contexte NiFi can be secured
by enabling SSL and requiring users/nodes to authenticate with certificates.
However, in some scenarios, customers have secured and unsecured
NiFi clusters that should communicate. The objective of this tutorial is to show
two approaches to achieve this. Discussions on having secure and
unsecured NiFi cluster in the same application are outside the topic of this
tutorial. Prerequisites Let's assume that we have already installed an
unsecure HDF cluster (Cluster2) that needs to send data to a secure cluster
(Cluster1). Cluster1 is a 3 node
NiFi cluster with SSL : hdfcluster0, hdfcluster1 and hdfcluster2. We can see
the HTTPS in the URLs as well as the connected user 'ahadjidj'. Cluster2 is also a 3 nodes
NiFi cluster but without SSL enabled : hdfcluster20, hdfcluster21 and hdfcluster22 Option 1: the lazy
option The easiest way to
get data from cluster 2 to cluster 1 is to use a Pull method. In this approach,
cluster 1 will use a Remote Process Group to pull data from cluster 2. We will
configure the RPG to use HTTP and no special configurations are required. However,
data will go unencrypted over the network. Let's see how to implement this. Step 1: configure
Cluster2 to generate data
The easiest way to generate
data in cluster 2 is to use a GenerateFlowFile processor. Set the File
Size to something different from 0 and Run Schedule to 60 sec
Add an ouput port to the
canvas and call it 'fromCluster2'
Connect and start the two
processors
At this level, we can see
data being generated and queued before the output port Step 2: configure
Cluster1 to pull data
Add a RPG and configure it
with HTTP addresses of the three Cluster2' nodes. Use HTTP as Transport
Protocol and enable the transmission.
Add a PutFile processor to
grab the data. Connect the RPG to the PutFile and chose the 'fromCluster2'
output when you are asked for.
Right click on the RPG and
activate the toggle next 'fromCluster2' We should see flow
files coming from the RPG and buffering before the PutFile processor. Option 2: the secure
option The first approach
was easy to configure but data was sent unencrypted over the wire. If we want
to leverage SSL and send data encrypted even between the two clusters, we need
to generate and use certificates for each node in the Cluster2. The only difference
here is that we don't activate SSL. Step 1: generate and
add Cluster2 certs
I suppose that you
already know how to generate certificates for CA/nodes and add them to
Truststore/KeyStore. Otherwise, there are several HCC articles that explain how
to do it. We need to configure
Cluster2 with its certificats
Upload nodes' certificate to
each node and add it to the KeyStore (eg. keystore.pfx). Set also the
KeyStore type and password.
Upload the CA (Certificate
Authority) certificate to each node and add it to the TrustStore (eg.
truststore.jks). Set also the TrustStore type and password. Step 2: configure
Cluster2 to push data to Cluster1
In Cluster1, add an
input port (toCluster1) and connect it to a PutFile processor. Use a GenerateFlowFile to generate data in Cluster2 and a RPG to push data to Cluster1. Here we will use HTTPS addresses when configuring the RPG. Cluster2 should be
able to send data to Cluster1 via the toCluster1 input port. However, the RPG
shows a Forbidden error Step 3: add policies
to authorize cluster2 to use the S2S protocol
The previous error is triggered because nodes belonging to Cluster2 are not authorized to access to
Cluster1 resources. To solve the problem, let's do the following
configurations: 1) Go the users menu
in Cluster1 and add a user for each node from Cluster2 2) Go to the policies
menu in Cluster1, and add each node from Cluster2 to the retrieve site-to-site
details policy At this point, the
RPG in Cluster2 is working however the input port is not visible yet 3) The last step is editing the input port policy in Cluster1 to authorize nodes from Cluster2 to
send data through S2S. Select the toCluster1 Input port and click on the key to
edit it's policies. Add cluster2 nodes to the list. 4) Now, go back to
cluster2 and connect the GenerateFlowFile with the RPG. The input port should
be visible and data start flowing "securely" 🙂
... View more
Labels:
10-02-2016
02:23 AM
Hi , Thaks for your quick reply. Couple of pints need to clarify
1. Application master is responsible for getting datablock info from NameNode , and creating container in the respective datanodes for processing the data
2. It is also responsible for monitoring the task and in case it failed, then app master will start the container in different datanode
... View more
08-26-2016
09:53 AM
Hi @Andread B, Why do you want to run NiFi on the NameNode ? If you are ingesting lot of data I would recommend running NiFi on a dedicated host or at least on edge node. Also, if you will ingest lot of data for a single NiFi instance, you can use GenerateTableFetch (coming in NiFi 1.0) to divide your import into several chunks, and distribute them on several NiFi nodes. This processor will generate several FlowFiles based on the Partition Size property where each FlowFile is a query to get a part of the data. You can try this by downloading NiFi 1.0 Beta : https://nifi.apache.org/download.html
... View more
05-19-2016
07:12 PM
1 Kudo
Again -- there will be further maintenance releases on 2.3 which align bug fixes for components in common with future feature bearing releases such as 2.4, 2.5, etc.
... View more
05-13-2016
06:28 PM
Yes you can @Hemant Kumar Dindi.. You can do this by registering service, services components and hosts using Ambari APIs.
... View more
05-12-2016
05:23 PM
3 Kudos
Hi @Kirk Haslbeck, You can use "Process Groups" to groupe several processors in one unit. You can then click to zoom in and out. You can also use labels to visually differentiate between each part with background colors
... View more