Member since
07-30-2019
333
Posts
357
Kudos Received
76
Solutions
10-30-2015
01:35 PM
From the HDF/NiFi standpoint, the only difference would be in a configuration switch for PutSolrContentStream: Standalone connects to a Solr node directly (e.g. port 9893) SolrCloud goes through a Zookeeper quorum (e.g. port 2181) and can talk to multiple nodes
... View more
10-27-2015
03:38 PM
I noticed this change doesn't persist after restart, any pointers here?
... View more
10-26-2015
12:45 PM
3 Kudos
When one stands up SolrCloud/HDPSearch, every core/replica will be writing logs to /opt/lucidworks-hdpsearch/solr/server/logs. Files like 'solr-8983-console.log' will grow very quickly for a lightly used index, at the rate of several GBs per day. The reason is it outputs all INFO and DEBUG messages to this console log. Which is great for development, but a pain for production and operations. The challenge is there are quite a few log configuration files spread around the Solr directories, and those will not necessarily yield the desired result. It gets even more complicated for a SolrCloud, where one is dealing with a cluster of nodes.
There is, however, a very straightforward way to quickly address this issue:
Navigate to the admin console, like http://xxx-solr-1:8983/solr
Click on Logging -> Level
Select a root logger (click on the level value) and modify it to be WARN
The change is applied at runtime, no restart required.
... View more
Labels:
10-23-2015
04:45 PM
You are missing a dependency from the example. Note that this is an external link, not created nor endorsed by Hortonworks. Example instructions mention it in the first paragraph: Note: this flow depends on nifi-websocket module, download nar and copy to $NIFI_HOME/lib Module source code is here: https://github.com/xmlking/nifi-websocket
... View more
10-23-2015
03:06 PM
1 Kudo
Consider the following flow where records are inserted in the RDBMS (or Phoenix/HBase in this instance): The PutSQL and similar processor depends on a database connection, provided by a DBCPConnectionPool controller service (Controller Settings -> Controller Services): The best part: when one creates a template, all linked services will automatically be included. When template is rehydrated in another NiFi instance, services like the one below will automatically be added to the new NiFi flow:
... View more
10-22-2015
01:24 PM
1 Kudo
The MergeContent processor certainly can be challenging to understand its inner workings. If you are running into the nifi.queue.swap.threshold limit of MergeContent as described in NIFI-697, then you should increase that value in the nifi.properties file and restart your NiFi process. A multiple of 10000 is recommended. You will also likely have to increase your Java memory settings in bootstrap.conf. MergeContent works like this. When a FlowFile arrives at MergeContent, it is assigned to a bin based on Merge Strategy and Correlation Attribute Name. Maximum Number of Bins controls resource usage such that if all bins have FlowFiles in them and another FlowFile arrives that doesn't fit into one of those bins, then the oldest bin is automatically marked as complete, and the new FlowFile starts its own new bin. A bin will be complete once (number of files in bin) >= Minimum Number of Entries AND (number of bytes in bin) >= Minimum Group Size OR the bin has existed for Max Bin Age. Then the FlowFiles in the bin are merged and sent to an output relationship. The Maximum Number of Entries and Maximum Group Size can prevent bins from becoming "over full". For example, when Maximum Group Size is 1 GB and a bin currently has 900 MB in it, then a flowfile arrives that is 200 MB in size, the 200 MB FlowFile will not make that bin "over full" but instead will get a bin all to itself. Credit goes to Michael Moser from the NiFi user list.
... View more
Labels:
10-21-2015
06:12 PM
11 Kudos
Once one moves beyond trivial flow design a visual image like the one below becomes common. Surely, it becomes more and more messy over time and harder to glance over. Here's a great tip - double-click on the connection line that you want to bend: A new yellow anchor will appear which one can then drag around to organize things nicely. Bonus tip - one can add multiple bend points to a connection. To remove a specific anchor, simply double-click on it again, repeat for each yellow point. And an extra bonus courtesy of @mgilman@hortonworks.com: a connection label can be moved along the connection and snaps to a bend point, see below:
... View more
Labels:
10-13-2015
12:58 PM
35 Kudos
Update: added a GitBook link The unofficial little black book of Kerberos, created and maintained by a HWX engineer, Steve Loughran. Lots of questions that you were afraid to ask. Many advanced customers found it a very useful guide, especially if one needs to develop solutions and code for a Kerberized cluster. I felt this guide needed much more exposure than it had so far. All credit goes to @stevel@hortonworks.com https://github.com/steveloughran/kerberos_and_hadoop Click on the Contents links ..or.. Enjoy as a GitBook, readable online, on mobile and exportable as e.g. a PDF: https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/
... View more
Labels:
10-08-2015
03:41 PM
If you installed NiFi as an Ambari service, there is a field controlling the runtime account for NiFi. It's 'nifi' by default.
... View more
10-06-2015
12:47 PM
14 Kudos
A series of examples and flow files: https://github.com/xmlking/nifi-examples NiFi Examples Apache NiFi example flows. collect-stream-logs This flow shows workflow for log collection, aggregation, store and display. Ingest logs from folders. Listen for syslogs on UDP port. Merge syslogs and drop-in logs and persist merged logs to Solr for historical search. Dashboard: stream real-time log events to dashboard and enable cross-filter search on historical logs data. csv-to-json This flow shows how to convert a CSV entry to a JSON document using ExtractText and ReplaceText. decompression This flow demonstrates taking an archive that is created with several levels of compression and then continuously decompressing it using a loop until the archived file is extracted out. http-get-route his flow pulls from a web service (example is nifi itself), extracts text from a specific section, makes a routing decision on that extracted value, prepares to write to disk using PutFile. invoke-http-route This flow demonstrates how to call an HTTP service based on an incoming FlowFile, and route the original FlowFile based on the status code returned from the invocation. In this example, every 30 seconds a FlowFile is produced, an attribute is added to the FlowFile that sets q=nifi, the google.com is invoked for that FlowFile, and any response with a 200 is routed to a relationship called 200. retry-count-loop This process group can be used to maintain a count of how many times a flowfile goes through it. If it reaches some configured threshold it will route to a 'Limit Exceeded' relationship otherwise it will route to 'retry'. Great for processes which you only want to run X number of times before you give up. split-route This flow demonstrates splitting a file on line boundaries, routing the splits based on a regex in the content, merging the less important files together for storage somewhere, and sending the higher priority files down another path to take immediate action. twitter-garden-hose This flow pulls from Twitter using the garden hose setting; it pulls out some basic attributes from the Json and then routes only those items that are actually tweets. twitter-solr This flow shows how to index tweets with Solr using NiFi. Pre-requisites for this flow are NiFi 0.3.0 or later, the creation of a Twitter application, and a running instance of Solr 5.1 or later with a tweets collection. Here are sample steps to set this up (along with Banana dashboard) on HDP Sandbox. Other examples https://github.com/hortonworks-gallery/nifi-templates
... View more
Labels:
- « Previous
-
- 1
- 2
- Next »