About sball

sball · ‎07-05-2016

One way of doing this is to push templates created in a dev instance onto a production instance of NiFi. This would usually be done through scripted API calls. NiFi deliberately avoids including sensitive properties like passwords and connection strings in the template files. However, given that these are likely to change in a production environment anyway, this is more a benefit than a drawback. A good way on handling this is to use the API again to populate production properties in the template once deployed. A good starting point for this would be to take a look at https://github.com/aperepel/nifi-api-deploy which provides a script configured with a yaml file to deploy templates and then update properties in a production instance. This will obviously be a lot cleaner once the community has completed the variable registry effort, but will provide you a good solution for now. As Joe points out, it is also important to ensure you copy up any custom processors you have in nar bundles as well, but that's just file copy and restart (and should be kept in a custom folder as joe suggests to make upgrades easier).

sball · ‎07-05-2016

For unit testing custom processors, nifi has a test framework, see https://docs.hortonworks.com/HDPDocuments/HDF1/HDF-1.2.0.1/bk_DeveloperGuide/content/testing.html for some details about the TestRunner and nifi-mock package. This allows building of quick mock flows and provides a number of assertions for testing processors. This approach allows fully automated testing of processor units. If you're looking for more of an integration testing approach a flow, this is a little more involved. You could use the nifi framework to programmatically create flows around mock repositories. However, for a full integration test in a live nifi instance, I would recommend using tracer messages or object in the relevant ingest points, and testing against results, possibly using the LogAttributes, or Routes based on metadata to check the integrity of your results. GenerateFlowFile can also be useful for testing flows. To automate these in a CI environment you might look at deploying Templates with a script into a CI hosted nifi instance.

sball · ‎06-23-2016

This was resolved by removing the state directory in the node. The node was originally configured with incorrect hostnames copied from an existing node's nifi.properties, which led the generation of an incorrect nodeid and storage in state. When the host settings were corrected the node id persisted in the state folder. Removing this folder corrected the problem and allowed the node to be joined to the cluster correctly.

sball · ‎06-23-2016

When adding a node to a NiFi cluster, I receive an error saying the NCM prevented connection due to the node id already existing.

sball · ‎05-31-2016

The best way to do this is with the new QueryDatabaseTable processor. This has a property which lets you list maximum value columns (usually an auto-increment id, sequence or something like a timestamp). Nifi then builds queries in much the same way as sqoop incremental. The QueryDatabaseTable also maintains the state of the processor, so there is no need to store the values yourself in the flow logic. This state is stored either locally or, if you are using a NiFi cluster, on Zookeeper. The processor is usually used directly against a table, but can also be applied against a view if you have a more complex query. If you use this method, you won't have to worry about the empty result set problem. However, one interesting way of dealing with 'missing' data in a NiFi flow is to use the MonitorActivity processor, which can be used to trigger flow in the absences of data over a given time window. Used with scheduling, this could achieve the logic to serve you point 2. That said, for the particular use case, this is moot, as you can just use the QueryDatabaseTable processor which does everything for you.

sball · ‎05-26-2016

The way to deal with this is to mark the original relation as auto-terminated in the SplitContent settings tab.

sball · ‎05-26-2016

Hi @Thierry Vernhet I've added a template and screenshot of a worked example, which should make it clearer. I suspect the problem you're seeing is around the relation being used to output from the SplitContent processor. If you use the original, or worse, both outputs you will just get the original content back. Note also that I've used the "Leading" location in my template, since the marker is inserted at the front of a line, and have also used Line-By-Line evaluation in the marker replace text for better memory usage.

sball · ‎05-24-2016

You can do this by using ReplaceText to replace ^(\d{2}\/\d{2}\/\d{4}) with some delimiter not in the set (e.g. ~$1), ie. prepend a magic character to the beginning on each Real line. You can then use SplitContent by the byte you chose to prepend with. This gives you flow files for each log entry. However, this can be a little heavy. Make sure you're running the latest version of NiFi, and if you're working with large log files, you may need to consider increasing file handle limits. The flow (template here: split-multi-line-example.xml) works for prepending and splitting. You can see here that 2 flowfiles have come out of the 5 line log file sample I put in.

sball · ‎05-23-2016

The Spark History server will have a list of all Jobs that have run using the YARN master. If you are looking for current running jobs, the RM will give you a full list, though this will of course also include non-spark jobs running on your cluster. If you are running spark standalone, you will not have any means of listing jobs.

sball · ‎05-20-2016

Could you provide a snippet from your nifi-app log with the stack trace for this error? I suspect the problem is that your hadoop-azure.jar is built against the wrong version of hadoop. What is the source of this file?

Online	Offline
Last Visited	‎10-19-2020 01:00 PM

Member Since	‎09-15-2015 10:07 PM
Last Visited	‎10-19-2020 01:00 PM
Posts	116
Kudos received	121

Cloudera Community

Re: metron pcap query

Re: metron pcap data stored in HDFS sequence forma...

Re: Can Apache Metron be installed using CDH or EM...

Re: Installation failed with ambari, Can I retry t...

Re: metron installation on existed ambari managed ...

Re: hdf / nifi code promotion to production

Re: Testing Nifi Unit & Integration Testing & auto...

Re: Joining a node to a NiFi cluster fails because...

Joining a node to a NiFi cluster fails because of ...

Re: How to Use Nifi to Incrementally Ingest Data f...

Re: How ingest and group multiline logs files with...

Re: How ingest and group multiline logs files with...

Re: How ingest and group multiline logs files with...

Re: List all created spark jobs

Re: Nifi No FilesSystem for scheme: wasb