Member since
09-15-2015
116
Posts
141
Kudos Received
40
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1829 | 02-05-2018 04:53 PM | |
2372 | 10-16-2017 09:46 AM | |
2058 | 07-04-2017 05:52 PM | |
3086 | 04-17-2017 06:44 PM | |
2258 | 12-30-2016 11:32 AM |
07-05-2016
03:28 PM
3 Kudos
One way of doing this is to push templates created in a dev instance onto a production instance of NiFi. This would usually be done through scripted API calls. NiFi deliberately avoids including sensitive properties like passwords and connection strings in the template files. However, given that these are likely to change in a production environment anyway, this is more a benefit than a drawback. A good way on handling this is to use the API again to populate production properties in the template once deployed.
A good starting point for this would be to take a look at https://github.com/aperepel/nifi-api-deploy which provides a script configured with a yaml file to deploy templates and then update properties in a production instance. This will obviously be a lot cleaner once the community has completed the variable registry effort, but will provide you a good solution for now. As Joe points out, it is also important to ensure you copy up any custom processors you have in nar bundles as well, but that's just file copy and restart (and should be kept in a custom folder as joe suggests to make upgrades easier).
... View more
07-05-2016
11:57 AM
5 Kudos
For unit testing custom processors, nifi has a test framework, see https://docs.hortonworks.com/HDPDocuments/HDF1/HDF-1.2.0.1/bk_DeveloperGuide/content/testing.html for some details about the TestRunner and nifi-mock package. This allows building of quick mock flows and provides a number of assertions for testing processors. This approach allows fully automated testing of processor units. If you're looking for more of an integration testing approach a flow, this is a little more involved. You could use the nifi framework to programmatically create flows around mock repositories. However, for a full integration test in a live nifi instance, I would recommend using tracer messages or object in the relevant ingest points, and testing against results, possibly using the LogAttributes, or Routes based on metadata to check the integrity of your results. GenerateFlowFile can also be useful for testing flows. To automate these in a CI environment you might look at deploying Templates with a script into a CI hosted nifi instance.
... View more
06-23-2016
10:07 PM
3 Kudos
This was resolved by removing the state directory in the node. The node was originally configured with incorrect hostnames copied from an existing node's nifi.properties, which led the generation of an incorrect nodeid and storage in state. When the host settings were corrected the node id persisted in the state folder. Removing this folder corrected the problem and allowed the node to be joined to the cluster correctly.
... View more
06-23-2016
10:04 PM
1 Kudo
When adding a node to a NiFi cluster, I receive an error saying the NCM prevented connection due to the node id already existing.
... View more
Labels:
- Labels:
-
Apache NiFi
05-31-2016
09:28 AM
3 Kudos
The best way to do this is with the new QueryDatabaseTable processor. This has a property which lets you list maximum value columns (usually an auto-increment id, sequence or something like a timestamp). Nifi then builds queries in much the same way as sqoop incremental. The QueryDatabaseTable also maintains the state of the processor, so there is no need to store the values yourself in the flow logic. This state is stored either locally or, if you are using a NiFi cluster, on Zookeeper. The processor is usually used directly against a table, but can also be applied against a view if you have a more complex query. If you use this method, you won't have to worry about the empty result set problem. However, one interesting way of dealing with 'missing' data in a NiFi flow is to use the MonitorActivity processor, which can be used to trigger flow in the absences of data over a given time window. Used with scheduling, this could achieve the logic to serve you point 2. That said, for the particular use case, this is moot, as you can just use the QueryDatabaseTable processor which does everything for you.
... View more
05-26-2016
02:23 PM
The way to deal with this is to mark the original relation as auto-terminated in the SplitContent settings tab.
... View more
05-26-2016
10:51 AM
Hi @Thierry Vernhet I've added a template and screenshot of a worked example, which should make it clearer. I suspect the problem you're seeing is around the relation being used to output from the SplitContent processor. If you use the original, or worse, both outputs you will just get the original content back. Note also that I've used the "Leading" location in my template, since the marker is inserted at the front of a line, and have also used Line-By-Line evaluation in the marker replace text for better memory usage.
... View more
05-24-2016
03:41 PM
2 Kudos
You can do this by using ReplaceText to replace ^(\d{2}\/\d{2}\/\d{4}) with some delimiter not in the set (e.g. ~$1), ie. prepend a magic character to the beginning on each Real line. You can then use SplitContent by the byte you chose to prepend with. This gives you flow files for each log entry. However, this can be a little heavy. Make sure you're running the latest version of NiFi, and if you're working with large log files, you may need to consider increasing file handle limits. The flow (template here: split-multi-line-example.xml) works for prepending and splitting. You can see here that 2 flowfiles have come out of the 5 line log file sample I put in.
... View more
05-23-2016
11:58 AM
2 Kudos
The Spark History server will have a list of all Jobs that have run using the YARN master. If you are looking for current running jobs, the RM will give you a full list, though this will of course also include non-spark jobs running on your cluster. If you are running spark standalone, you will not have any means of listing jobs.
... View more
05-20-2016
09:57 AM
Could you provide a snippet from your nifi-app log with the stack trace for this error? I suspect the problem is that your hadoop-azure.jar is built against the wrong version of hadoop. What is the source of this file?
... View more