Member since
07-30-2019
155
Posts
106
Kudos Received
33
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2953 | 04-18-2019 08:27 PM | |
949 | 12-31-2018 07:36 PM | |
1838 | 12-03-2018 06:47 PM | |
652 | 06-02-2018 02:35 AM | |
1692 | 06-02-2018 01:30 AM |
11-17-2016
08:04 PM
1 Kudo
If the token can be retrieved from an external resource via static credentials, you can use a separate InvokeHTTP processor to perform the authentication and load the token into flowfile content, which is then fed to the follow-on InvokeHTTP processor to perform the actual request.
@Pierre Villard has a good example of using OAuth 1.0 to generate credential material and then use that in a follow-on request.
... View more
11-17-2016
08:01 PM
Also ensure that the keystore files have the correct permissions to be readable by the user running NiFi.
... View more
11-17-2016
08:00 PM
The error " org.xml.sax.SAXParseException; systemId: file:/F:/Tools/HDF-2.0.1.0/nifi/./conf/users.xml; lineNumber: 1; columnNumber: 1; Premature end of file. " indicates that the users.xml file was either empty or contained only comments.
... View more
11-16-2016
01:55 AM
3 Kudos
If you are not using the NiFi CA, you can still secure your HDF instances by providing each with resources meeting the following requirements:
Keystore
The keystore must contain a PrivateKeyEntry containing the private key and public certificate with valid dates and a DN matching the fully-qualified domain name (FQDN) of the host, and if signed by another key, the public certificate of that resource
Truststore
The truststore must contain a trustedCertEntry containing the public certificate of each authorized user or the CA used to sign the individual certificates.
The nifi.properties file must contain the path to each keystore and truststore and the corresponding password to access each.
To configure LDAP authentication, you follow the same steps as for a standalone instance. The nifi.properties and login-identity-providers.xml files must be synchronized to all nodes in the cluster.
... View more
11-11-2016
06:11 PM
@slachterman you are correct and I have updated my answer to reflect this. Thanks.
... View more
11-10-2016
06:35 PM
Sunile,
The Dataflow Manager is a role assumed by a person to build, configure, and monitor the data flow. They can do this via the REST API or the UI. The UI is exposed on all nodes in Apache NiFi 1.0.0+ (HDF 2.0+). Previously, in a clustered environment, only the Primary Node NCM exposed the UI. This may be the source of your confusion.
... View more
11-10-2016
05:44 AM
I am not aware of a setting to specify a number of retry attempts. If you think this would be a common requirement, please file a Jira requesting the feature for Apache NiFi. You can route data through a series of processors a specified number of times using a loop as demonstrated by Koji Kawamura here. You can also pair an UpdateAttribute processor with your PutHDFS so when following the failure relationship, an attribute is added/updated indicating the number of failed tries. You can then route continually-failed data to your fallback flow ( PutFile & PutEmail , etc.).
Flowfile expiration refers to the age (amount of time since the flowfile entered NiFi) before the flowfile should be dropped, so it is not an ideal fit for this use case.
... View more
11-04-2016
05:33 PM
Can you please post the complete output of curl or whatever REST client you are using? The complete URI to use is http(s)://host:port/nifi-api/provenance-events/{id} .
... View more
11-03-2016
12:35 AM
Hi Ben, can you please provide the Maven command you ran and the entire output (this will be long, so posting as a GitHub Gist can be helpful)? That dependency is brought in by the Hadoop jars, but it should be available to you.
... View more
11-02-2016
06:38 PM
You should be able to use ReplaceText processor to change the \u0001 delimiter to whatever you like (new line, comma, etc.) and then use ConvertCSVToAvro with a literal CSV delimiter. I haven't tried, but does CSV delimiter need to be a regular expression in order to identify Unicode? Is it just a matter of escaping the \ in the literal?
... View more
10-31-2016
06:08 PM
Bojan, if you are using Apache NiFi 1.0.0 or later, use this guide by @Bryan Bende.
... View more
10-29-2016
03:53 AM
1 Kudo
Hi @Timothy Spann, take a look at this answer I just provided for a similar question.
Although, you should also look at SiteToSiteProvenanceReportingTask , which allows you "export" them from the NiFi provenance repository and ingest them into a NiFi instance (could be remote, could be same) as pure data and then manipulate them as you would any other data (e.g. feed to PutHDFS ).
... View more
10-29-2016
03:50 AM
1 Kudo
@Greg Keys unfortunately I think you are correct that ExecuteScript is the best way to achieve this right now. As far as I know, the PutFile processor cannot append to an existing file. You are given the option to deal with conflicting files using "replace", "ignore", or "fail" as a resolution strategy. You should submit an Apache Jira to add this functionality. I could see difficulties with file locks and flushing the buffer given the streaming nature of NiFi and I think further investigation is needed.
... View more
10-29-2016
03:38 AM
5 Kudos
You can send a
POST request to https://host:port/nifi-api/provenance which will submit a provenance query. The contents of this request should be {"provenance":{"request":{"maxResults":1000}}} (with a configurable count). The response will contain an identifier for the query (as it may take a long time to execute). You can then submit a GET request to https://host:port/nifi-api/provenance/{query-id} which will respond with the results of the query. From this response, you can iterate/extract specific provenance event IDs and request more information using the GET /provenance-events/{id} method. You can also add additional filters and search terms in the initial query to refine it (see GET /provenance/search-options for available options).
Example response to search options (
GET https://nifi.nifi.apache.org:9443/nifi-api/provenance/search-options ):
{"provenanceOptions":{"searchableFields":[{"id":"EventType","field":"eventType","label":"Event Type","type":"STRING"},{"id":"FlowFileUUID","field":"uuid","label":"FlowFile UUID","type":"STRING"},{"id":"Filename","field":"filename","label":"Filename","type":"STRING"},{"id":"ProcessorID","field":"processorId","label":"Component ID","type":"STRING"},{"id":"Relationship","field":"relationship","label":"Relationship","type":"STRING"}]}}
Example response to initial query submission (
POST https://nifi.nifi.apache.org:9443/nifi-api/provenance ):
{"provenance":{"id":"0e74ca7f-0158-1000-e780-8ec16cb486ba","uri":"https://nifi.nifi.apache.org:9443/nifi-api/provenance/0e74ca7f-0158-1000-e780-8ec16cb486ba","submissionTime":"10/28/2016 20:21:24.866 PDT","expiration":"10/28/2016 20:51:24.868 PDT","percentCompleted":0,"finished":false,"request":{"searchTerms":{},"maxResults":1000},"results":{"provenanceEvents":[],"total":"0","totalCount":0,"generated":"20:21:24 PDT","oldestEvent":"10/28/2016 20:14:44 PDT","timeOffset":-25200000}}}
Example response to query update request (
GET https://nifi.nifi.apache.org:9443/nifi-api/provenance/0e74ca7f-0158-1000-e780-8ec16cb486ba ):
{"provenance":{"id":"0e74ca7f-0158-1000-e780-8ec16cb486ba","uri":"https://nifi.nifi.apache.org:9443/nifi-api/provenance/0e74ca7f-0158-1000-e780-8ec16cb486ba","submissionTime":"10/28/2016 20:21:24.866 PDT","expiration":"10/28/2016 20:51:24.876 PDT","percentCompleted":100,"finished":true,"request":{"searchTerms":{},"maxResults":1000},"results":{"provenanceEvents":[{"id":"13","eventId":13,"eventTime":"10/28/2016 20:15:43.606 PDT","lineageDuration":11,"eventType":"DROP","flowFileUuid":"66d1354f-1c0d-4658-9263-17d77ef741df","fileSize":"0 bytes","fileSizeBytes":0,"groupId":"0947f405-0158-1000-d643-3299cb111b40","componentId":"0e6f37b9-0158-1000-254c-d79aeb605761","componentType":"LogAttribute","componentName":"LogAttribute","attributes":[{"name":"filename","value":"1234593201638265","previousValue":"1234593201638265"},{"name":"path","value":"./","previousValue":"./"},{"name":"uuid","value":"66d1354f-1c0d-4658-9263-17d77ef741df","previousValue":"66d1354f-1c0d-4658-9263-17d77ef741df"}],"parentUuids":[],"childUuids":[],"details":"Auto-Terminated by success Relationship","contentEqual":false,"inputContentAvailable":false,"outputContentAvailable":false,"outputContentClaimFileSize":"0 bytes","outputContentClaimFileSizeBytes":0,"replayAvailable":false,"replayExplanation":"Cannot replay data from Provenance Event because the event does not contain the required Content Claim","sourceConnectionIdentifier":"0e6f4ed9-0158-1000-0ab4-97a391fb36b8"},
...
{"id":"0","eventId":0,"eventTime":"10/28/2016 20:14:44.820 PDT","lineageDuration":10,"eventType":"CREATE","flowFileUuid":"3e5d9a82-f185-48d5-b0e4-f3b818f81a13","fileSize":"0 bytes","fileSizeBytes":0,"groupId":"0947f405-0158-1000-d643-3299cb111b40","componentId":"0e6e6df1-0158-1000-e997-ec080b4cdd98","componentType":"GenerateFlowFile","componentName":"GenerateFlowFile","attributes":[{"name":"filename","value":"1234534415887832"},{"name":"path","value":"./"},{"name":"uuid","value":"3e5d9a82-f185-48d5-b0e4-f3b818f81a13"}],"parentUuids":[],"childUuids":[],"contentEqual":false,"inputContentAvailable":false,"outputContentAvailable":false,"outputContentClaimFileSize":"0 bytes","outputContentClaimFileSizeBytes":0,"replayAvailable":false,"replayExplanation":"Cannot replay data from Provenance Event because the event does not contain the required Content Claim"}],"total":"14","totalCount":14,"generated":"20:21:25 PDT","oldestEvent":"10/28/2016 20:14:44 PDT","timeOffset":-25200000}}}
Response to query delete request when processing is complete (
DELETE https://nifi.nifi.apache.org:9443/nifi-api/provenance/0e74ca7f-0158-1000-e780-8ec16cb486ba ):
{}
... View more
10-27-2016
05:12 PM
Mark, I'm glad the answer helped you. You should open a new question for the user permission issue and I will take a look.
... View more
10-27-2016
12:22 AM
3 Kudos
Mark,
The certificate you purchased from a certificate authority will identify the NiFi application. Depending on the format it is in (likely a *.key file containing the private key which never left your computer and a *.pem or *.der file containing the corresponding public key, which was then signed via a CSR (Certificate Signing Request) sent to the CA), you will need to build the following files:
Keystore
This will contain the private key and public key certificate with the issuing CA's public certificate in a chain (as a privateKeyEntry) [see example output below]
Truststore
This will contain the public key of your client certificate (if using one) in order to authenticate you as a user connecting to the UI/API.
Alternate example using keytool :
You generate a public/private keypair using the Java keytool :
$ keytool -genkey -alias nifi -keyalg RSA -keysize 2048 -keystore keystore.jks
You then export a certificate signing request which you send to the certificate authority:
$ keytool -certreq -alias nifi -keyalg RSA -file nifi.csr -keystore keystore.jks
You will get a CSR file nifi.csr which you send to the CA, and they provide a signed public certificate (and the public certificate of the CA) back cert_from_ca.pem :
$ keytool -import -trustcacerts -alias nifi -file cert_from_ca.pem -keystore keystore.jks Here is a link to the full steps I ran (I ran my own CA in another terminal to simulate the actions of the external CA) and the resulting output.
... View more
10-25-2016
06:17 PM
2 Kudos
As NiFi uses Jetty internally for its web server capabilities, you could try using a
HeaderPatternRule as described here to enable HSTS , which forces only HTTPS connections. Browsers respond to the provided Strict-Transport-Security header and know to attempt an HTTPS connection.
This isn't directly supported by NiFi though, so you would have to modify code in the application. There is an existing
Apache Jira (NIFI-2437) for this to be enabled through a NiFi configuration setting.
... View more
10-14-2016
04:07 PM
Frank, Yes, you are correct that replacing the flow definition requires restarting the server. The VR wiki page is a work in progress, as is the development effort. @Yolanda M. Davis has done significant work on this and more information is available in the Getting Started Guide and Admin Guide.
... View more
10-13-2016
11:06 PM
4 Kudos
Hi Frank,
The development/QA/production environment
promotion process (sometimes referred to as "SDLC" or "D2P" in conversation) is a topic of much discussion amongst the HDF development team. Currently, there are plans to improve this process in a future release. For now, I will discuss some common behaviors/workflows that we have seen.
The $NIFI_HOME/conf/flow.xml.gz file contains the entire flow serialized to XML. This file contains all processor configuration values, even sensitive values (encrypted). With the new Variable Registry effort, you can refer to environment-specific variables transparently, and promote the same flow between environments without having to update specific values in the flow itself.
The XML flow definition or specific templates can be committed and versioned using Git (or any other source code control tool). Recent improvements like "deterministic template diffs" have made this versioning easier.
The NiFi REST API can be used to "automate" the deployment of a template or flow to an instance of NiFi.
A script (Groovy, Python, etc.) could be used to integrate with both your source code control tool and your various NiFi instances to semi-automate this process (i.e. tap into Git hooks detecting a commit, and promote automatically to the next environment), but you probably want some human interaction to remain for verification at each state.
We understand that the current state of NiFi is not ideal for the promotion of the flow between dev/QA/prod environments. There are ongoing efforts to improve this, but I can't describe anything concrete at this time. If these points raise specific questions or you think of something else, please follow up.
... View more
10-10-2016
07:11 PM
1 Kudo
Hi,
I'm assuming that you are using multiple capture groups to extract each piece of information. Can you explain what "it is not working" looks like in your situation? Is it capturing nothing, capturing different values than you expected, or throwing an exception? One possibility is that your expression is not focused enough -- if that is the complete expression, it would capture "133" first (as well as "199" and "040" before getting to "200"). If you know the log format will remain consistent, you might want to try something like
HTTP\/\d\.\d" (\d{3}) . Please let us know if you have any more information and if this solves your problem.
Update: I tested this expression and was able to get the following output:
--------------------------------------------------
Standard FlowFile Attributes
Key: 'entryDate'
Value: 'Mon Oct 10 12:18:27 PDT 2016'
Key: 'lineageStartDate'
Value: 'Mon Oct 10 12:18:27 PDT 2016'
Key: 'fileSize'
Value: '115'
FlowFile Attribute Map Content
Key: 'HTTP response'
Value: '200'
Key: 'HTTP response.0'
Value: 'HTTP/1.0" 200'
Key: 'HTTP response.1'
Value: '200'
Key: 'filename'
Value: '787130965602970'
Key: 'path'
Value: './'
Key: 'uuid'
Value: 'ccb6f333-de33-4037-9a1a-aa9ce7f2ef32'
--------------------------------------------------
133.43.96.45 - - [01/Aug/1995:00:00:16 -0400] "GET /shuttle/missions/sts-69/mission-sts-69.html HTTP/1.0" 200 10566
I uploaded the template I used here: ExtractText Regex Template.
... View more
09-28-2016
06:47 PM
2 Kudos
Riccardo, I'm sorry you were having this problem, but I just wanted to say thank you for writing such a complete and detailed question. Providing the list of things you have already tried and specific expectations makes answering it much easier for everyone involved. It definitely cuts down on the mental gymnastics of trying to estimate the experience level and comprehension of a user we haven't communicated with before.
... View more
09-26-2016
06:18 PM
I would not recommend using haveged without fully understanding the issue of getting sufficiently unpredictable random input for security purposes. Multiple well-credentialed security experts have weighed in with concerned, if not dismissive, responses.
Michael Kerrisk:
Having read a number of papers about HAVEGE, Peter [Anvin] said he had been unable to work out whether this was a "real thing". Most of the papers that he has read run along the lines, "we took the output from HAVEGE, and ran some tests on it and all of the tests passed". The problem with this sort of reasoning is the point that Peter made earlier: there are no tests for randomness, only for non-randomness.
One of Peter's colleagues replaced the random input source employed by HAVEGE with a constant stream of ones. All of the same tests passed. In other words, all that the test results are guaranteeing is that the HAVEGE developers have built a very good PRNG. It is possible that HAVEGE does generate some amount of randomness, Peter said. But the problem is that the proposed source of randomness is simply too complex to analyze; thus it is not possible to make a definitive statement about whether it is truly producing randomness. (By contrast, the HWRNGs that Peter described earlier have been analyzed to produce a quantum theoretical justification that they are producing true randomness.) "So, while I can't really recommend it, I can't not recommend it either." If you are going to run HAVEGE, Peter strongly recommended running it together with rngd, rather than as a replacement for it.
Tom Leek:
Of course, the whole premise of HAVEGE is questionable. For any practical security, you need a few "real random" bits, no more than 200, which you use as seed in a cryptographically secure PRNG. The PRNG will produce gigabytes of pseudo-[data] indistinguishable from true randomness, and that's good enough for all practical purposes.
Insisting on going back to the hardware for every bit looks like yet another outbreak of that flawed idea which sees entropy as a kind of gasoline, which you burn up when you look at it.
I would recommend directing the JVM to read from /dev/urandom . In response to the concerns above, I'm not sure what "It's not guaranteed to always work" means, but the other issues are mitigated by providing a Java parameter in conf/bootstrap.conf .
... View more
09-22-2016
07:26 PM
If you prefer not to follow this path, you could connect together an EvaluateJsonPath processor and an UpdateAttribute processor and use the Expression Language mathematical operators to calculate these values as well.
... View more
09-22-2016
07:22 PM
3 Kudos
Hi @Obaid Salikeen, My suggestion would be to use ExecuteScript processor with your scripting language of choice (Groovy, Python, Ruby, Scala, Javascript, and Lua are all supported). With Groovy, for example, this would be approximately 4 lines -- use a JsonSlurper to parse the JSON and extract the value(s) you are interested in, then use any combination of collect, sum, average, etc. to perform the desired mathematical operation, and return this in the OutputStream .
@Matt Burgess has some good examples of using ExecuteScript with JSON arrays here.
... View more
09-20-2016
06:48 PM
That is literally the use case for MiNiFi. MiNiFi runs with far fewer resources necessary and no UI, but can accept flows designed using the NiFi UI on the same or separate machine.
... View more
09-17-2016
04:17 AM
2 Kudos
Hi @Mohit Sharma, You should look at the ExecuteScript and InvokeScriptedProcessor processors. Both of these processors can execute code written in Javascript directly from NiFi. @Matt Burgess has written very helpful blog posts on this functionality.
... View more
08-30-2016
04:15 AM
2 Kudos
David, While "individual canvases" are not available, NiFi 1.0.0 (released today) introduces the concept of multi-tenant authorization. This allows extremely granular security access controls for groups and users over components, process groups, controller services, etc. Using MTA, an administrator can provide multiple process groups at the root process group level and apply different read and write permissions to each, effectively allowing users to access their "own canvases", without being able to view or modify another user's process group. The official NiFi documentation on the website should be updated within the next 24 hours to reflect the new behavior, but for now you can download and build NiFi 1.0.0 and then click "Help" from the "hamburger menu" in the upper right corner of the canvas. The Admin Guide and User Guide have extensive documentation on using this feature.
... View more
07-14-2016
12:07 AM
This question is very nebulous. Please describe the Hortonworks product name (HDP, HDF, etc.) you are using as well as the version, and describe the specific problem you are trying to solve. This will help people answer your question.
... View more
06-23-2016
06:35 PM
2 Kudos
Yes, the connection cannot be deleted while it is providing data to a downstream component. The downstream component must be disabled or deleted before the connection can be deleted.
... View more
03-03-2016
07:09 PM
1 Kudo
It appears Hue uses Django for the web server, which can either run with its own embedded server (recommended for dev/test only) and using Apache httpd (recommended for production). To disable SSLv2/v3 in Apache, you should use the following config line in each VirtualHost block in the httpd.conf or ssl.conf files:
SSLProtocol all -SSLv2 -SSLv3
... View more
- « Previous
- Next »