Member since
09-29-2015
871
Posts
723
Kudos Received
255
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4182 | 12-03-2018 02:26 PM | |
| 3148 | 10-16-2018 01:37 PM | |
| 4266 | 10-03-2018 06:34 PM | |
| 3100 | 09-05-2018 07:44 PM | |
| 2363 | 09-05-2018 07:31 PM |
09-11-2016
04:02 PM
1 Kudo
I don't think MergeContent supports multiple attribute names for the correlation attribute, but you could put an UpdateAttribute processor right before it and make a new attribute like: combined = ${attribute1}_${attribute2} and then use "combined" in MergeContent
... View more
09-10-2016
07:29 PM
1 Kudo
If I'm understanding the scenario correctly, you would probably do something like this... - ExecuteSQL/QueryDatabaseTable to get data from the database, produces Avro - ConvertAvroToJSON or ConvertAvroToCSV, I'm going to use JSON going forward - SplitJSON to split each record into its own flow file - EvaluateJSONPath to extract DoA, DoB, and DoC into flow file attributes From here it kind of depends the logic you want to happen and whether those three fields are mutually exclusive (only one is ever true) or if 2 out of 3 can be true, but you would use RouteOnAttribute with a property like DoA = ${DoA:equals("true")} to send everything that matches that to that relationship, and then send that relationship to the processors you want to perform the logic when DoA is true. You could have a series of RouteOnAttribute processors, or you could have one with complex statements like: ${DoA:equals("true"):and( ${DoB:equals("false")} )} You can take a look at the expression language guide for more detail on constructing the right expressions: https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html
... View more
09-09-2016
05:53 PM
1 Kudo
@INDRANIL ROY Here is a template that shows how to get the data formatted properly: delimitedtohbase.xml The first two processors (GenerateFlowFile and ReplaceText) are just creating fake data every 30 seconds, you would replace that with wherever your data is coming from.
... View more
09-09-2016
12:25 PM
Just to clarify, you used SplitText to split the lines of the CSV right? If that is true, then one thing you can try in NiFi is to send the success relationship of PutCassandraQL to a MergeContent processor, and set MergeContent's "Merge Strategy" to "Defragment". Defragment mode merges together all flow files that have the same value for an attribute called "fragment.identifier". The SplitText processor writes three attributes on all the child flow files it creates - "fragment.identifier", "fragment.count", and "fragment.index", so your flow files would be able to be defragmented by MergeContent. Once MergeContent has defragmented them then you can do whatever you want to notify the user, maybe use PutEmail processor.
... View more
09-08-2016
07:56 PM
This post is written against Apache NiFi 0.6.1... in that version when you enabled the RPG and it makes a request to the secure instance, it automatically makes an account request for the DN being used by the RPG (mycert.p12). Then someone has to go into the other instance UI and approve that account and give it a role of a NIFI, which behind the scenes updates the authorized_users.xml. This is a bit different in Apache NiFi 1.0.0... there is no more automatic account requests and the authorization model is very different. Refer this blog post for how to setup the authorizations: http://bryanbende.com/development/2016/08/30/apache-nifi-1.0.0-secure-site-to-site
... View more
09-08-2016
05:59 PM
1 Kudo
It depends what you want to do, there are a lot of options... If you want to store one line of piped values as the value of a cell, you could use SplitText with a line count of 1 to get each line into its own flow file, then send each of those to PutHbaseCell and set the Row Id property to something unique like ${uuid} or whatever you want. If you want the piped values to represent multiple cells with in one row then you need to convert each line of piped text to JSON somehow, you probably still need to split each line as described above, then use something like ExtractText and ReplaceText to create JSON (https://github.com/hortonworks-gallery/nifi-templates/blob/master/templates/csv-to-json-flow.xml) or you use ExcecuteScript processor with a Groovy or Python script that converted your piped line to JSON.
... View more
09-08-2016
05:24 PM
4 Kudos
There is no local authentication. NiFi provides authentication with 2-way SSL (certificates), Kerberos, or LDAP. You can also implement your own identity provider to authenticate users however you like: https://github.com/apache/nifi/blob/d1129706e235548daaf4eecf7001b244300761e9/nifi-framework-api/src/main/java/org/apache/nifi/authentication/LoginIdentityProvider.java You would create a NAR with your implementation and put it in NiFI lib.
... View more
09-08-2016
05:02 PM
If you use PutHbaseCell with a FlowFile that has 50 lines, all 50 lines will be written as value of one cell (row id, col fam, col qual). PutHBaseCell has no idea what the content of the FlowFile is, it takes the content as byte[] and sticks it in one cell.
... View more
09-08-2016
01:46 PM
5 Kudos
PutHBaseCell is used to write a single cell to HBase and it uses the content of the FlowFile as the value of the cell, the column family and column qualifier come from properties in the processor. If you want to write multiple values then you would want to use PutHBaseJSON which takes a flat JSON document and uses the field names as column qualifiers and the value of each field as the value for that column qualifier. The column family is a property in the processor. It doesn't support writing to multiple column families, so you would need to take your original data and split it into two JSON documents, one for column family 1 and one for column family 2. You could then have two PutHBaseJSON processors for each column family, or you could have one where the column family was set to ${col.family} and you could set an attribute "col.family" on each flow file upstream to specify which column family goes with that flow file.
... View more
09-08-2016
01:19 PM
2 Kudos
This is actually a little bit challenging to do right now... I'm assuming that after you get the CSV from Kafka you then used SplitText to split it into individual lines, and then converted each of those to CQL somehow? There currently isn't a great way to count that all of the flow files from the original CSV have reached a certain point in the flow, but there are a few JIRA tickets open to be able to do something like this. The idea would be have a processor that could act as barrier after the Cassandra processor and could wait for all N flow files before allowing anything to proceed, and at that point you could then send a notification to the user. One of the relevant JIRAs is this one: https://issues.apache.org/jira/browse/NIFI-1926
... View more