Member since
07-30-2019
333
Posts
357
Kudos Received
76
Solutions
05-12-2016
01:56 PM
Hey @Jobin George , a very fun flow. You could use different voices for a stock going up or down, check out http://www.techradar.com/us/how-to/computing/apple/terminal-101-making-your-mac-talk-with-say-1305649
... View more
05-12-2016
01:14 PM
Thank you Jim, I've updated the Ethernet section to give you credit.
... View more
05-11-2016
12:26 AM
19 Kudos
Overview Raspberry Pi is an interesting little machine. Even more so with the release of the Raspberry Pi 3. While it may not technically be the most powerful device in this form-factor, it definitely wins on overall community, breadth of support and stability. Naturally, there is sizable interest in running something like NiFi on this little edge device. And that's exactly what we have been doing for the past 8 months (as of May 2016). Today I'd like to share a set of best practices for running NiFi on the Pi that came out of this exercise. In general, recommendations fall into 2 categories: Raspberry Pi configuration – OS, system-level settings for running an always-on NiFi instance. NiFi configuration – tweaks and changes within NiFi itself Let’s start with some prep work. Raspberry Pi Configuration I’m assuming the Raspberry Pi is fully under one’s control, including physical access to SD card and USB slots, together with a root account. SD Card If you bought a starter kit, it probably came with the NOOBS image pre-installed on the SD card. It’s fine for all kinds of projects, but running things 24x7 changes things. Double-check the manufacturer and use a brand-name SD card like SanDisk. Unfortunately, those other brands in starter kits tend to corrupt much more often, and this is not what you want for an always-on service. Go for a faster SD card. E.g. look for those cards having the 4K video designation and 90 MB/s sustained write speed or better. Your Pi will thank you. Use Raspbian OS Lite Raspberry Pi default OS image has lots of stuff. All great, but absolutely useless and a waste if you never intend to run it in a desktop mode. Instead, take your SD card and re-image it with the lightweight version of the OS (Raspbian Jessie Lite at the time of writing). It drops all desktop software, giving you more space to do what’s important (i.e. running NiFi 24x7!). This is a standard procedure which is well documented at https://www.raspberrypi.org/documentation/installation/installing-images/ Upgrade OS Packages One is encouraged to upgrade the OS to pick up any maintenance releases since the OS image was published: sudo apt-get update && sudo apt-get dist-upgrade Go have a coffee or make a sandwich. Note on Initial Setup Of course, there is this little chicken-and-egg problem of having to connect your typical desktop monitor and keyboard first to go through an initial boot and configuration. One has 3 options: Plug in an Ethernet cable and access over SSH. You may need to have access to a router or basic network scanning utility to find Pi's address. @Jim Heaton has more tips in the comment section below. Thanks! Connect a monitor and keyboard and configure WiFi, start up the SSH daemon and set your Pi free. Hardcore, but lots of fun - connect directly to a serial console over TX/RX/GND wire combination. Finally a good use for that array of GPIO connectors!. You can either buy one of those USB-to-TTL cables or leverage e.g. a BusPirate device if you have one. If above makes sense, you probably can find your way from here, but let me know if you’re interested in knowing more. I can only mention that it saved me so many times even in a disk full, SSH daemon down, SD card corrupted situation (or a combination of all). Expand the Root Filesystem Once the lightweight OS has been put on the SD card and Raspberry Pi boots up, there’s one thing often overlooked. The default root partition is around 1.2GB and it will get full eventually. It’s very tedious then to clean up the space, as there is no obvious single large file one could delete to reclaim the space. Instead, use all that SD card (remember NiFi will be running on another mount anyway). Login into your Pi and run: sudo raspi-config Select option #1 - Expand Filesystem. The system will expand the root partition to use available SD card space and reboot. Configure OS Locale and Timezone This is optional, but really not if you plan to collect any interesting data and still make sense of it during analysis. While in the same raspi-config screen, select option #5 Internationalization and update your Pi’s Locale and Timezone. External Storage If you can, consider external storage for hosting everything NiFi (I.e. not on the same SD card where your OS lives). One can go as far as plugging in a huge external drive (hey, no problem, just ensure it has its own power supply!) into a USB port. But we found a common USB flash drive (or multiple) to be a suitable medium, too. They seem to be less prone to corruption than the SD card, just go with the brand name and opt for higher-speed models whenever you can. Configure USB Mounts I will not repeat the internet, there are plenty of guides on how to do it for the Pi. One super-useful tip is to mount by a UUID of the flash drive, which guarantees the mount bindings will persist no matter which USB port one plugs it in. Gives the warm and fuzzy, which we all love. Here’s my favorite guide: http://www.raspberrypi-spy.co.uk/2014/05/how-to-mount-a-usb-flash-disk-on-the-raspberry-pi/ . Use either UUDI or a drive label, just don't use the actual device name, as it can change when you reconfigure the disks. Disable Access Times Recording Additionally, modify the mounts for the SD card and USB flash drives to disable access times recording for files and directories. This minimizes unnecessary writes to the SD card and flash drives: sudo vi /etc/fstab
# modify the options for a mount to add ‘noatime,nodiratime’
# save changes
# repeat for every mount point which you updated above, example for root below
sudo mount -o remount /
# to verify changes - look for your settings in the output
mount E.g. here’s how my fstab looks like with an SD card and 2 USB flash drives: proc /proc proc defaults 0 0
/dev/mmcblk0p1 /boot vfat defaults 0 2
/dev/mmcblk0p2 / ext4 defaults,noatime 0 1
LABEL=USBFLASH1 /mnt/flash1 ext4 defaults,noatime,nodiratime,nofail 0
LABEL=USBFLASH2 /mnt/flash2 ext4 defaults,noatime,nodiratime,nofail 0
Update: a more robust fstab options string. If your Pi is hanging on boot with auto-mounted drives, try add the nofail option. Disable WiFi Power Saving This is kinda critical and will bite you every time if you forget. By default, Raspberry Pi shuts down wifi after some inactivity period only to re-enable it when there’s an incoming connection. Great in theory. In practice, one is facing a 30-40 seconds delay when trying to access the Pi over SSH or hit a NiFi UI. The procedure is slightly different for Pi 2 & 3, I’m listing both here: Raspberry Pi 2 Follow instructions at http://www.raspberrypi-spy.co.uk/2015/06/how-to-disable-wifi-power-saving-on-the-raspberry-pi/ Raspberry Pi 3 sudo iw dev wlan0 set power_save off Create a NiFi User Account We will not be running NiFi from the SD card. Let’s ensure the home directory isn’t there either. Create a nifi user account and configure its home location to be on the new USB mount (e.g. if there is a /mnt/flash1): sudo useradd -m -d /mnt/flash1/nifi nifi
sudo chmod 750 /mnt/flash1/nifi
# and here’s how one would log into the account
# sudo su - nifi Configure OS Kernel Follow the https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#configuration-best-practices I, however, configure the limits for the nifi user alone, not for any account on the host. E.g. replace those star symbols with ‘nifi': nifi hard nofile 50000
nifi soft nofile 50000
nifi hard nproc 10000
nifi soft nproc 10000 Install JDK Java 8 is highly recommended. Largely because the PermGen space doesn’t require any special treatment, great for systems like NiFi. Important: we had some reports of OpenJDK behaving less stable than Oracle JDK in the ARM build (this is what Pi is using). Save yourself some trouble, install Oracle JDK and move on: sudo apt-get update && sudo apt-get install oracle-java8-jdk NiFi Configuration NiFi installation is trivial, download from https://nifi.apache.org/download.html and unpack. Make sure to unpack in the nifi user home directory (e.g. /mnt/flash1/nifi) Let’s Get Lean! Before anything else, it’s important to remove modules which don’t make sense on Raspberry Pi. This will improve startup times considerably and reduce strain on the system. You have a final say of what goes and stays, but here’s e.g. the contents of my $NIFI_HOME/lib directory (you can delete all other NAR files): bootstrap
jcl-over-slf4j-1.7.12.jar
jul-to-slf4j-1.7.12.jar
log4j-over-slf4j-1.7.12.jar
logback-classic-1.1.3.jar
logback-core-1.1.3.jar
nifi-api-0.6.1.jar
nifi-documentation-0.6.1.jar
nifi-framework-nar-0.6.1.nar
nifi-html-nar-0.6.1.nar
nifi-http-context-map-nar-0.6.1.nar
nifi-jetty-bundle-0.6.1.nar
nifi-kerberos-iaa-providers-nar-0.6.1.nar
nifi-ldap-iaa-providers-nar-0.6.1.nar
nifi-nar-utils-0.6.1.jar
nifi-properties-0.6.1.jar
nifi-provenance-repository-nar-0.6.1.nar
nifi-runtime-0.6.1.jar
nifi-scripting-nar-0.6.1.nar
nifi-ssl-context-service-nar-0.6.1.nar
nifi-standard-nar-0.6.1.nar
nifi-standard-services-api-nar-0.6.1.nar
nifi-update-attribute-nar-0.6.1.nar
slf4j-api-1.7.12.jar Repositories Configuration If you have multiple USB drives you can leverage striping for content and provenance repositories. This will improve overall throughput of NiFi. For a great background and tuning tips see: https://community.hortonworks.com/content/kbentry/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html Reference documentation for all properties is available at https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#system_properties Tame Those Logs NiFi can get pretty chatty about, well, everything. In an environment that is sensitive to random writes (like the Raspberry Pi) we are better off changing a few things. Edit the $NIFI_HOME/conf/logback.xml and introduce the following changes. Only ERROR and above messages by default <root level="ERROR">
<appender-ref ref="APP_FILE"/>
</root> NiFi packages at WARN and above (add one if missing) <logger name="org.apache.nifi" level="WARN"/> Compress and roll daily logs for nifi-app.log Find the appender section for nifi-app.log and modify the file pattern to read as below (note the .gz extension): <fileNamePattern>./logs/nifi-app_%d{yyyy-MM-dd_HH}.%i.log.gz</fileNamePattern> Bonus tip: there’s no need to restart NiFi to pick up changes in the logging configuration. The file is checked for changes every 30 seconds and NiFi reconfigures logging on the fly. Closing Thoughts Once again, these are only a few tips which came through very useful while running NiFi on a Raspberry Pi. There are more tuning steps available based on the kind of data flowing through it, but hope this gets you started and provides sufficient guard rails. Let us know your thoughts!
... View more
Labels:
12-23-2015
03:20 PM
I would highly recommend chroot'ing the SolrCloud config, otherwise it dumps all entries at the root of a ZooKeeper tree. See https://community.hortonworks.com/content/kbentry/7081/best-practice-chroot-your-solr-cloud-in-zookeeper.html for details.
... View more
12-17-2015
03:32 PM
3 Kudos
These instructions assume you don't need to preserve Solr indexes. If you do, modify ZK commands to move nodes instead of removing them. Stop every SolrCloud node: # on every SolrCloud node
su - solr
cd /opt/lucidworks-hdpsearch/solr/bin/
./solr stop -all
Connect to your ZooKeeper quorum and run zkCli shell: su - zookeeper
cd /usr/hdp/current/zookeeper-client/bin/
# point it to a ZK quorum (or just a single ZK server is ok, e.g. localhost)
./zkCli.sh -server lake02:2181,lake03:2181,lake04:2181 If you followed best practices and chroot'ed your SolrCloud already, then things are easy (and you probably are done by now): # in ZK cli shell
rmr /solr More often this wasn't the case, so perform the following operations on your ZK tree: # in ZK cli shell
rmr /clusterstate.json
rmr /aliases.json
rmr /live_nodes
rmr /overseer
rmr /overseer_elect
rmr /collections
Next, follow this best practices article, create a new ZK home for your SolrCloud cluster and start it up in chroot'ed mode.
... View more
Labels:
12-17-2015
03:05 PM
1 Kudo
When running Solr in a clustered mode (SolrCloud), it has a runtime dependency on a ZooKeeper, where it stores configs, coordinates leader election, tracks replicas allocation, etc. All-in-all, there's a whole tree of ZK nodes created with sub-nodes. Deploying SolrCloud into a Hadoop cluster usually means re-using the centralized ZK quorum already maintained by HDP. Unfortunately, if not explicitly taken care of, SolrCloud will happily dump all its ZK content in ZK root, which really complicates things for an admin down the line. If you need to clean up your ZK first, take a look at this how-to. Solution is to put all SolrCloud ZK entries under its own ZK node (e.g. /solr). Here's how one does it: su - zookeeper
cd /usr/hdp/current/zookeeper-client/bin/
# point it a ZK quorum (or just a single ZK server is ok, e.g. localhost)
./zkCli.sh -server lake02:2181,lake03:2181,lake04:2181
# in zk shell now
# note the empty brackets are _required_
create /solr []
# verify the zk node has been created, must not complain the node doesn't exist
ls /solr
quit
# back in the OS shell
# start SolrCloud and tell it which ZK node to use
su - solr
cd /opt/lucidworks-hdpsearch/solr/bin/
# note how we add '/solr' to a ZK quorum address.
# it must be added to the _last_ ZK node address
# this keeps things organized and doesn't pollute root ZK tree with Solr artifacts
./solr start -c -z lake02:2181,lake03:2181,lake04:2181/solr
# alternatively, if you have multiple IPs on your Hadoop nodes and have
# issues accessing Solr UI and dashboards, try binding it to an address explicitly:
./solr start -c -z lake02:2181,lake03:2181,lake04:2181/solr -h $HOSTNAME
... View more
Labels:
12-12-2015
07:48 AM
1 Kudo
Great content. For developer productivity we should ask @dkumar@hortonworks.com to share the in-memory dev stack of ZK, Kafka, Storm, the one they are using in the labs and training.
... View more
11-10-2015
07:55 PM
11 Kudos
Today we will show how to interact with a NiFi instance to modify a flow at runtime via API. Pre-requisites
NiFi installed and running on a localhost: https://nifi.apache.org/download.html Groovy - because its JSON builders and REST DSLs are great. If you are on a Mac, the easiest is to run brew install groovy. To install Homebrew (the superb Mac package manager), visit http://brew.sh/ Full script is available on GitHub: https://github.com/aperepel/nifi-rest-api-tutorial NiFi REST API Docs: https://nifi.apache.org/docs/nifi-docs/rest-api/index.html Here’s the test flow we will be working with today: Prepare the test flow:
Add a PutFile component to the canvas. Rename the processor to Save File (right-click -> Configure -> Settings -> Name field). We will be using this name to look up the processor later via API. Add a GetHTTP processor and create a connection from GetHTTP to ‘Save File’. GetHTTP settings can be ignored for now, the Save File processor simply needs an input connected. Set the Save File properties as below (these settings will be modified programmatically next) Start the Save File processor (no need to start GetHTTP for our purposes) Note: for a more complex flow, one would use templates: https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#templates Working with the API Next, we will update the Save File processor to use a different directory (/tmp/staging) and set Create Missing Directories to true. High-level script flow:
Search the data flow for a component to operate on. Lookup term is 'Save File'. This is the same API used by the Search field in the UI. Validate there’s only 1 processor returned - we want to make sure we’re modifying the expected one. Sync up with the framework state - get the latest version field value, will be used in the update statement next. This is a classic Optimistic Locking pattern implementation. Build a small JSON document containing only state changes. Perform a partial update via a PUT operation. Repeat steps 4-5 to stop, update configuration (change directory and missing dirs properties) and start the processor. For the impatient among us execute the script directly (clone/checkout the github if you want to play with the code later): groovy https://raw.githubusercontent.com/aperepel/nifi-rest-api-tutorial/master/reconfigure.groovy You will see an output similar to this: Looking up a component to update...
Found the component, id/group: c35f1bb7-5add-427f-864a-bdd23bb4ac7f/f1a2c4e8-b106-4877-97d9-9dbca868fc16
Preparing to update the flow state...
Stopping the processor to apply changes...
Updating processor...
{
"revision": {
"clientId": "my awesome script",
"version": 309
},
"processor": {
"id": "c35f1bb7-5add-427f-864a-bdd23bb4ac7f",
"config": {
"properties": {
"Directory": "/tmp/staging",
"Create Missing Directories": "true"
}
}
}
}
Updated ok.
Bringing the updated processor back online...
Ok If you check the NiFi processor again, you will see the updated Directory and Create Missing Dirs. Additionally, every step has been captured and recorded in the flow history: When you see a warning message in the UI, simply hit the Refresh link right next to it - I will explain the concurrency controls at the end of this article. Code Walkthrough First, we will pull in a dependency https://github.com/jgritman/httpbuilder/wiki/RESTClient . It is available in a public maven repository and is fetched automatically. @Grab(group='org.codehaus.groovy.modules.http-builder',
module='http-builder',
version='0.7.1')
This allows us to use nice REST DSL like these: nifi.get(
path: 'controller/search-results',
query: [q: processorName]
)
nifi.put(
path: "controller/process-groups/$processGroup/processors/$processorId",
body: builder.toPrettyString(),
requestContentType: JSON
)
Next, we are using Groovy's JSON builder to construct a JSON document for a partial PUT update, i.e. only specify the properties you want to change in the update, like this: builder {
revision {
clientId 'my awesome script'
version resp.data.revision.version
}
processor {
id "$processorId"
config {
properties {
'Directory' '/tmp/staging'
'Create Missing Directories' 'true'
}
}
}
}
Those dot-notation variables navigate the JSON document tree from a previous response. To understand how to structure it, start by issuing a GET request against your processor, which will fetch a complete state document. Tip: UI does everything through the REST API, it’s a great learning interactive learning tool in itself. One note, though, the UI will interchangeably leverage both PUT and POST (form) requests, so choose whichever is more convenient. In this write-up we will be using PUT with JSON. Finally, the clientId and version business is explained in the next section. Optimistic Locking in NiFi The diagram below describes the concept. Supplying a clientId is required for update operations to avoid running into consistency issues (the API will respond with 409 Conflict status code and it will be really confusing if a developer doesn’t know about this attribute). controller/revision returns the clientId of a user who last modified the flow among other things. This is NOT always your id, best practice is to supply your own unique value to identify the client. It’s actually a free-form value, UUID is just a default that the framework generates for you if missing.
... View more
Labels:
11-06-2015
02:56 PM
Ali, in step #2, it should read 'on the right', currently guides a reader to the left and is misleading.
... View more
10-30-2015
01:35 PM
From the HDF/NiFi standpoint, the only difference would be in a configuration switch for PutSolrContentStream: Standalone connects to a Solr node directly (e.g. port 9893) SolrCloud goes through a Zookeeper quorum (e.g. port 2181) and can talk to multiple nodes
... View more