Member since 
    
	
		
		
		02-09-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                559
            
            
                Posts
            
        
                422
            
            
                Kudos Received
            
        
                98
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2864 | 03-02-2018 01:19 AM | |
| 4592 | 03-02-2018 01:04 AM | |
| 3068 | 08-02-2017 05:40 PM | |
| 2870 | 07-17-2017 05:35 PM | |
| 2103 | 07-10-2017 02:49 PM | 
			
    
	
		
		
		03-04-2017
	
		
		06:05 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 Objective 
 This tutorial will walk you through the process of using Ansible, an agent-less automation tool, to create instances on AWS. The Ansible playbook we will use is relatively simple; you can use it as a base to experiment with more advanced features. You can read more about Ansible here: Ansible. 
 Ansible is written in Python and is installed as a Python module on the control host. The only requirement for the hosts managed by Ansible is the ability to login with SSH. There is no requirement to install any software on the host managed by Ansible. 
 If you have never used Ansible, you can become more familiar with it by going through some basic tutorials. The following two tutorials are a good starting point: 
 
 Automate All Things With Ansible: Part One 
 Automate All Things With Ansible: Part Two 
 
 This tutorial is part 1 of a 2 part series. Part 2 in the series will show you how to use Ansible to deploy Hortonworks Data Platform (HDP) on Amazon Web Services (AWS). 
 This tutorial was created as a companion to the Ansible + Hadoop talk I gave at the Ansible NOVA Meetup in February 2017. You can find the slides to that talk here: SlideShare 
 You can get a copy of the playbook from this tutorial here: Github 
 Prerequisites 
 
 You must have an existing AWS account. 
 You must have access to your AWS Access and Secret keys. 
 You are responsible for all AWS costs incurred. 
 
 Scope 
 This tutorial was tested using the following environment and components: 
 
 Mac OS X 10.11.6 and 10.12.3 
 Amazon Web Services 
 Anaconda 4.1.6 (Python 2.7.12) 
 Ansible 2.0.0.2 and 2.1.3.0 
 
 Steps 
 Create a project directory 
 You need to create a directory for your Ansible playbook. I prefer to create my project directories in ~/Development. 
 mkdir ~/Development/ansible-aws
cd ~/Development/ansible-aws
 
 Install Ansible module 
 If you use the Anaconda version of Python, you already have access to Ansible. If you are not using Anaconda, then you can usually install Ansible using the following command: 
  pip install ansible  
 To read more about how to install Ansible: Ansible Installation 
 Overview of our Ansible playbook 
 Our playbook is relatively simple. It consists of a single inventory file, single group_vars file and a single playbook file. Here is the layout of the file and directory structure: 
 +- ansible-aws/
   |
   +- group_vars/
   |  +- all
   |
   +- inventory/
   |  +- hosts
   |
   +- playbooks/
   |  +- ansible-aws.yml
 
 group_vars/all 
 You can use variables in your playbooks using the  {{variable name}}  syntax. These variables are populated based on values stored in your variable files. You can explicitly load variable files in your playbooks. 
 However, all playbooks will automatically load the variables in the  group_vars/all  variable file. The  all  variable file is loaded for all hosts regardless of the groups the host may be in. In our playbook, we are placing our AWS configuration values in the  all  file. 
 Edit the  group_vars/all  file. Copy and paste the following text into the file: 
 aws_access_key: <enter AWS access key>
aws_secret_key: <enter AWS secret key>
key_name: <enter private key file alias name>
aws_region: <enter AWS region>
vpc_id: <enter VPC ID>
ami_id: ami-6d1c2007
instance_type: m4.2xlarge
my_local_cidr_ip: <enter cidr_ip>
 
 
  aws_access_key : You need to enter your AWS Access key 
  aws_secret_key : You need to enter your AWS Secret key 
  key_name : The alias name you gave to the AWS private key which you will use to SSH into the instances. In my case I created a key called  ansible . 
  aws_region : The AWS region where you want to deploy your instances. In my case I am using  us-east-1 . 
  vpc_id : The specific VPC in which you want to place your instances. 
  ami_id : The specific AMI you want to deploy for your instances. The  ami-6d1c2007  AMI is a CentOS 7 image. 
  instance_type : The type of AWS instance. For deploying Hadoop, I recommend at least  m4.2xlarge . A faster alternative is  c4.4xlarge . 
  my_local_cidr_ip : Your local computer's CIDR IP address. This is used for creating the security rules that allow your local computer to access the instances. An example CIDR format is  192.168.1.1/32 . Make sure this set to your computer's public IP address. 
 
 After you have entered your appropriate settings, save the file. 
 inventory/hosts 
 Ansible requires a list of known hosts against which playbooks and tasks are run. We will tell Ansible to use a specific host file with the  -i inventory/hosts  parameter. 
 Edit the  inventory/hosts  file. Copy and paste the following text into the file: 
 [local]
localhost ansible_python_interpreter=/Users/myoung/anaconda/bin/python
 
 
  [local] : Defines the group the host belongs to. You have the option for a playbook to run against all hosts, a specific group of hosts, or an individual host. This AWS playbook only runs on your local computer. That is because it uses the AWS APIs to communicate with AWS. 
  localhost : This is the hostname. You can list multiple hosts, 1 per line under each group heading. A host can belong to multiple groups. 
  ansible_python_interpreter : Optional entry that tells Ansible which specific version of Python to run. Because I am using Anaconda Python, I've included that setting here. 
 
 After you have entered your appropriate settings, save the file. 
 playbooks/ansible-aws.yml 
 The playbook is where we define the list of tasks we want to perform. Our playbook will consist of 2 tasks. The first task is to create a specific AWS Security Group. The second tasks is to create a specific configuration of 6 instances on AWS. 
 Edit the file  playbooks/ansible-aws.yml . Copy and paste the following text into the file: 
 ---
# Basic provisioning example
- name: Create AWS resources
  hosts: localhost
  connection: local
  gather_facts: False
  tasks:
  - name: Create a security group
    ec2_group:
      name: ansible
      description: "Ansible Security Group"
      region: "{{aws_region}}"
      vpc_id: "{{vpc_id}}""
      aws_access_key: "{{aws_access_key}}"
      aws_secret_key: "{{aws_secret_key}}"
      rules:
        - proto: all
          cidr_ip: "{{my_local_cidr_ip}}"
        - proto: all
          group_name: ansible
      rules_egress:
        - proto: all
          cidr_ip: 0.0.0.0/0
    register: firewall
  - name: Create an EC2 instance
    ec2:
      aws_access_key: "{{aws_access_key}}"
      aws_secret_key: "{{aws_secret_key}}"
      key_name: "{{key_name}}"
      region: "{{aws_region}}"
      group_id: "{{firewall.group_id}}"
      instance_type: "{{instance_type}}"
      image: "{{ami_id}}"
      wait: yes
      volumes:
        - device_name: /dev/sda1
          volume_type: gp2
          volume_size: 100
          delete_on_termination: true
      exact_count: 6
      count_tag:
         Name: aws-demo
      instance_tags:
         Name: aws-demo
    register: ec2
 
 This playbook uses the Ansible ec2 and ec2_group modules. You can read more about the options available to those modules here: 
 
 ec2 
 ec2_group 
 
 The task to create the EC2 security group creates a group named  ansible . It defines 2 ingress rules and 1 egress rule for that security group. The first ingress rule is to allow all inbound traffic from any host in the security group  ansible . The second ingress rule is to allow all inbound traffic from your local computer IP address. The egress rule allows all traffic out from all of the hosts. 
 The task to create the EC2 instances creates  6  hosts because of the  exact_count  setting. It creates a tag called  hadoop-demo  on each of the instances and uses that tag to determine how many hosts exists. You can chose to use smaller number of hosts. 
 You can specify volumes to mount on each of the instances. The default volume size is  8  GB and is too small for deploying Hadoop later. I recommend setting the size to at least  100  GB as above. I also recommend you set  delete_on_termination  to  true . This will tell AWS to delete the storage after you have deleted the instances. If you do not do this, then storage will be kept and you will be charged for it. 
 After you have entered your appropriate settings, save the file. 
 Running the Ansible playbook 
 Now that our 3 files have been created and saved with the appropriate settings, we can run the playbook. To run the playbook, you use the  ansible-playbook -i inventory/hosts playbooks/ansible-aws.yml  command. You should see something similar to the following: 
 $ ansible-playbook -i inventory/hosts playbooks/ansible-aws.yml
PLAY [Create AWS resources] ****************************************************
TASK [Create a security group] *************************************************
changed: [localhost]
TASK [Create an EC2 instance] **************************************************
changed: [localhost]
PLAY RECAP *********************************************************************
localhost                  : ok=2    changed=2    unreachable=0    failed=0
 
 The  changed  lines indicate that Ansible found a configuration that needed to be modify to be consistent with our requested state. For the security group task, you would see this if your security group didn't exist or if you had a different set of ingress or egress rules. For the instance tasks, you would see this if there were less than or more than 6 hosts tagged as  aws-demo . 
 Check AWS console. 
 If you check your AWS console, you should be able to confirm the instances are created. You should see something similar to the following: 
    
 Review 
 If you successfully followed along with this tutorial, you have created a simple Ansible playbook with 2 tasks using the ec2 and ec2_group Ansible modules. The playbook creates an AWS security group and instances which can be used later for deploying HDP on AWS. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-28-2017
	
		
		04:40 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 Objective  
 This tutorial is designed to walk you through the process of creating a MiniFi flow to read data from a Sense HAT sensor on a Raspberry Pi 3. The MiniFi flow will push data to a remote NiFi instance running on your computer. The NiFi instance will push the data to Solr.  
 While there are other tutorials and examples of using NiFi/MiniFi with a Raspberry Pi, most of those tutorials tend to use a more complicated sensor implementation. The Sense HAT is 
 very easy to install and use.  Prerequisites  
 
 You should have a Raspberry Pi 3 Model B: Raspberry Pi 3 Model B  I recommend a 16+GB SD card for your Raspberry Pi 3.  Don't forget to expand the filesystem after the OS is installed: raspi-config    
 
 
 You should have a Sense HAT: Sense HAT  You should already have installed the Sense HAT on your Raspberry Pi 3.    
 
 You should already have installed Raspbian Jessie Lite on your Raspberry Pi 3 SD card: Raspbian Jessie Lite  The instructions for installing a Raspberry Pi OS can be found here: Raspberry PI OS Install  You may be able to use the NOOBS operating system that typically ships with the Raspbery Pi. However, the Raspbian Lite OS will ensure the most system resources available to MiniFi or NiFi.    
 
 
 You should have enabled SSH on your Raspberry Pi: Enable SSH  
 You should have enabled WiFi on your Raspberry Pi (or use wired networking): Setup WiFi  
 You should have NiFi 1.x installed and working on your computer: NiFi  
 You should have the Java MiniFi Toolkit 0.1.0 installed and working on your computer: MiniFi ToolKit  
 You should have downloaded Solr 6.x on your computer: Solr Download   Scope  
 This tutorial was tested using the following environment and components:  
 
 Mac OS X 10.11.6 and 10.12.3  
 MiniFi 1.0.2.1.1.0-2.1  
 MiniFi Toolkit 0.1.0  
 NiFi 1.1.1  
 Solr 6.4.1  
 Java JDK 1.8   Steps  Connect to Raspberry Pi using SSH  
 If you have completed all of the prerequisites, then you should be able to easily SSH into your Raspberry Pi. On my Mac, I connect using:  
  ssh pi@raspberrypi   
 The default username is 
  pi  and the password is  raspberry .  
 If you get an unknown host or DNS error, then you need to specify the IP address of the Raspberry Pi. You can get that by logging directly into the Raspberry Pi console.  
 Now run the 
  ifconfig  command.  
 You should see something similar to the following: 
 pi@raspberrypi:~ $ ifconfig
eth0      Link encap:Ethernet  HWaddr b8:27:eb:60:ff:5b
          inet6 addr: fe80::ec95:e79b:3679:5159/64 Scope:Link
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
wlan0     Link encap:Ethernet  HWaddr b8:27:eb:35:aa:0e
          inet addr:192.168.1.204  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::21f6:bf0f:5f9f:d60d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:17280 errors:0 dropped:11506 overruns:0 frame:0
          TX packets:872 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3414755 (3.2 MiB)  TX bytes:133472 (130.3 KiB)
  
 If you are using WiFi, then look at the 
  wlan0  device. If you are using wired ethernet, then look at the  eth0  device. Now you can connect using the ip address you found.  
  ssh pi@192.168.1.204 .  
 Your IP address will vary.  Update Raspberry Pi packages  
 It's always a good idea to ensure your installed packages are up to date. Raspbian Lite is based on Debian. Therefore you need use 
  apt-get  to update and install packages.  
 First, we need to run 
  sudo apt-get update  to update the list of available packages and versions. You should see something similar to the following: 
 pi@raspberrypi:~ $ sudo apt-get update
Get:1 http://mirrordirector.raspbian.org jessie InRelease [14.9 kB]
Get:2 http://archive.raspberrypi.org jessie InRelease [22.9 kB]
Get:3 http://mirrordirector.raspbian.org jessie/main armhf Packages [8,981 kB]
Get:4 http://archive.raspberrypi.org jessie/main armhf Packages [145 kB]
Get:5 http://archive.raspberrypi.org jessie/ui armhf Packages [57.6 kB]
Get:6 http://mirrordirector.raspbian.org jessie/contrib armhf Packages [37.5 kB]
Get:7 http://mirrordirector.raspbian.org jessie/non-free armhf Packages [70.3 kB]
Get:8 http://mirrordirector.raspbian.org jessie/rpi armhf Packages [1,356 B]
Ign http://archive.raspberrypi.org jessie/main Translation-en_US
Ign http://archive.raspberrypi.org jessie/main Translation-en
Ign http://archive.raspberrypi.org jessie/ui Translation-en_US
Ign http://archive.raspberrypi.org jessie/ui Translation-en
Ign http://mirrordirector.raspbian.org jessie/contrib Translation-en_US
Ign http://mirrordirector.raspbian.org jessie/contrib Translation-en
Ign http://mirrordirector.raspbian.org jessie/main Translation-en_US
Ign http://mirrordirector.raspbian.org jessie/main Translation-en
Ign http://mirrordirector.raspbian.org jessie/non-free Translation-en_US
Ign http://mirrordirector.raspbian.org jessie/non-free Translation-en
Ign http://mirrordirector.raspbian.org jessie/rpi Translation-en_US
Ign http://mirrordirector.raspbian.org jessie/rpi Translation-en
Fetched 9,330 kB in 17s (542 kB/s)
Reading package lists... Done
  
 Now we can update our installed packages using 
  sudo apt-get dist-upgrade . You should see something similar to the following: 
 pi@raspberrypi:~ $ sudo apt-get dist-upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
  bind9-host libbind9-90 libdns-export100 libdns100 libevent-2.0-5 libirs-export91 libisc-export95 libisc95 libisccc90
  libisccfg-export90 libisccfg90 libjasper1 liblwres90 libpam-modules libpam-modules-bin libpam-runtime libpam0g login
  passwd pi-bluetooth raspberrypi-sys-mods raspi-config vim-common vim-tiny
24 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 4,767 kB of archives.
After this operation, 723 kB disk space will be freed.
Do you want to continue? [Y/n] y
  
 The list of packages and versions that need to be updated will vary. Enter 
  y  to update the installed packages.  Install additional Raspberry Pi packages  
 We need to install additional packages to interact with the Sense HAT sensor and run MiniFi.  
 You access the Sense HAT libraries using Python. Therefore the first package we need to install is Python.  
  sudo apt-get install python   
 The second package we need to install is the libraries for the Sense HAT device.  
  sudo apt-get install sense-hat   
 We will be using the Java version of MiniFi. Therefore the third package we need to install is the Oracle JDK 8.  
  sudo apt-get install oracle-java8-jdk   Verify Sense HAT functionality  
 Before we use MiniFi to collect any data, we need to ensure we can interact with the Sense HAT sensor. We will create a simple Python script to display a message on our Sense HAT.  
 Edit the file display_message.py using 
  vi display_message.py . Now copy and paste the following text into your text editor (remember to go into insert mode first): 
 from sense_hat import SenseHat
sense = SenseHat()
sense.show_message("Hello")
  
 Save the script using 
  :wq! . Run this script using  python display_message.py . You should see the word  Hello  scroll across the display of the Sense HAT in white text.  
 Now let's test reading the temperature from the Sense Hat. Edit the file get_temp.py using 
  vi get_temp.py . Now copy and paste the following text into your text editor (remember to go into insert mode first): 
 from sense_hat import SenseHat
sense = SenseHat()
t = sense.get_temperature()
print('Temperature = {0:0.2f} C'.format(t))
  
 Save the script using 
  :wq! . Run the script using  python get_temp.py . You should something similar to the following (your values will vary): 
 pi@raspberrypi:~ $ python get_temp.py
Temperature = 31.58 C
  
 For our MiniFi use case, we will be looking at temperature, pressure, and humidity data. We will not use the Sense HAT display for MiniFi, so we'll only print the data to the console.  
 You can read more about the Sense HAT functions here: 
 Sense HAT API  
 Now let's create a script which prints all 3 sensor values. Edit the file get_environment.py using 
  vi get_environment.py . Copy and paste the following text into your text editor (remember to go into insert mode first): 
 from sense_hat import SenseHat
import datetime
sense = SenseHat()
t = sense.get_temperature()
p = sense.get_pressure()
h = sense.get_humidity()
print('Hostname = rapsberrypi')
print('DateTime = ' + datetime.datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ"))
print('Temperature = {0:0.2f} C'.format(t))
print('Pressure = {0:0.2f} Millibars'.format(p))
print('Humidity = {0:0.2f} %rH'.format(h))
  
 Save the script using 
  :wq! . Run the script using  python get_environment.py . You should something similar to the following (your values will vary): 
 Hostname = rapsberrypi
DateTime = 2017-02-27T21:20:55Z
Temperature = 32.90 C
Pressure = 1026.53 Millibars
Humidity = 25.36 %rH
  
 As you can see from the script, we are printing our date output using UTC time via the 
  utcnow()  function. We also need to ensure the data format is consumable by Solr. That is why we are using  %Y-%m-%dT%H:%M:%SZ  which is a format Solr can parse.  
 Our MiniFi flow will use the 
  ExecuteProcess  to run the script. So we need to create a simple bash script to run the  get_environment.py  file. Edit the file get_environment.sh using  vi get_environment.sh . Copy and paste the following text into your text editor (remember to go into insert mode first): 
 python /home/pi/get_environment.py
  
 Save the script using 
  :wq! . Make sure the script is executable by running  chmod 755 get_environment.sh . Let's make sure the bash script works ok. Run the script using  ./get_environment.sh . You should something similar to the following (your values will vary): 
 Hostname = rapsberrypi
DateTime = 2017-02-27T21:20:55Z
Temperature = 32.90 C
Pressure = 1026.53 Millibars
Humidity = 25.36 %rH
  Install MiniFi  
 We are going to install MiniFi on the Raspberry Pi. First download the the MiniFi release.  
  wget http://public-repo-1.hortonworks.com/HDF/2.1.1.0/minifi-1.0.2.1.1.0-2-bin.tar.gz  Now you can extract it using  tar xvfz minifi-1.0.2.1.1.0-2-bin.tar.gz .  
 Now we are ready to create our NiFi and MiniFi flows.  Start NiFi  
 On your computer (not on the Raspberry Pi), start NiFi if you have not already done so. You do this by running 
  <nifi installation dir>/bin/nifi.sh start . It may take a few minutes before NiFi is fully started. You can monitor the logs by running  tail -f <nifi installation dir>/log/nifi.app.log .  
 You should see something similar to the following when the UI is ready: 
 2017-02-26 14:10:01,199 INFO [main] org.eclipse.jetty.server.Server Started @40057ms
2017-02-26 14:10:01,695 INFO [main] org.apache.nifi.web.server.JettyServer NiFi has started. The UI is available at the following URLs:
2017-02-26 14:10:01,695 INFO [main] org.apache.nifi.web.server.JettyServer http://127.0.0.1:9091/nifi
2017-02-26 14:10:01,695 INFO [main] org.apache.nifi.web.server.JettyServer http://192.168.1.186:9091/nifi
2017-02-26 14:10:01,697 INFO [main] org.apache.nifi.BootstrapListener Successfully initiated communication with Bootstrap
2017-02-26 14:10:01,697 INFO [main] org.apache.nifi.NiFi Controller initialization took 11161419754 nanoseconds.
  
 Now you should be able to access NiFi in your browser by going to 
  <hostname>:8080/nifi . The default port is  8080 . If you have a port conflict, you can change the port.  
 You should see a blank NiFi canvas similar to the following: 
 
   
  
  NiFi Blank Canvas
  
  Setup Solr  
 Before we start on our NiFi flow, let's make sure Solr is running. We are going to use schemaless mode. You can easily start Solr using 
  solr -e schemaless .  
 You should see something similar to the following: 
 $ bin/solr -e schemaless
Creating Solr home directory /Users/myoung/Downloads/solr-6.4.1/example/schemaless/solr
Starting up Solr on port 8983 using command:
bin/solr start -p 8983 -s "example/schemaless/solr"
Waiting up to 180 seconds to see Solr running on port 8983 [\]
Started Solr server on port 8983 (pid=49659). Happy searching!
Copying configuration to new core instance directory:
/Users/myoung/Downloads/solr-6.4.1/example/schemaless/solr/gettingstarted
Creating new core 'gettingstarted' using command:
http://localhost:8983/solr/admin/cores?action=CREATE&name=gettingstarted&instanceDir=gettingstarted
{
  "responseHeader":{
    "status":0,
    "QTime":1371},
  "core":"gettingstarted"}
Solr schemaless example launched successfully. Direct your Web browser to http://localhost:8983/solr to visit the Solr Admin UI
  
 As you can see, Solr created a collection called 
  gettingstarted . That is the name of the collection our NiFi  PutSolrContentStream  will use.  Create NiFi flow  
 Now we need to create our NiFi flow that will receive data from MiniFi.  Input Port  
 The MiniFi flow will send data to a 
  Remote Process Group . The  Remote Process Group  requires an  Input Port . From the NiFi menu, drag the  Input Port  icon to the canvas.  
 In the 
  Add Port  dialog that is displayed, type a name for your port. I used  From Raspberry Pi . You should see something similar to the following: 
 
  
    
  
  
 Click the blue 
  ADD  button.  ExtractText  
 From the NiFi menu, drag the 
  Processor  icon to the canvas. In the Filter box, enter  extract . You should see something similar to the following: 
 
  
    
  
  
 Select the 
  ExtractText  processor. Click on the blue  ADD  button to add the processor to the canvas.  
 Now we need to configure the 
  ExtractText  processor. Right click on the processor and select the  Configure  menu option.  
 On the 
  SETTINGS  tab of the  ExtractText  processor, you should check the  unmatched  box under  Automatically Terminate Relationships . This will drop any records which we fail to extract text from. You should see something similar to the following: 
 
  
    
  
  
 On the 
  PROPERTIES  tab of the  ExtractText  processor, there are a few changes we need to make.  
 First, we want want to set 
  Enable Multiline Mode  to  true . This allows the Regular Expressions to match across multiple lines. This is important because our data is coming in as multiline data.  
 Second, we want to set 
  Include Capture Group 0  to  false . Each Regular Expression we are using has only a single group. If we left this value to true, each field we extract would have duplicate values which would go unused as  <attribute name>.0 .  
 Third, we need to add additional fields to the processor which allows us to define our Regular Expressions. If you click the 
  +  icon in the upper right corner of the dialog, you should see something similar to the following: 
 
  
    
  
 
  
 We are going to add a property called 
  hostname . This will hold the value from the line  Hostname =  in the data. Click the blue  OK  button. Now you should see another dialog where you enter the regular expression. You should see something similar to the following: 
 
  
    
  
  
 Enter the following Regular Expression: 
 Hostname = (\w+)
  
 We need to repeat this process for each of the other data elements coming from the Raspberry Pi. You should have the following extra fields defined as separate fields: 
 property: hostname
value: Hostnamne = (\w+)
property: datetime
value: DateTime = (\d{4}\-\d{2}\-\d{2}T\d{2}\:\d{2}:\d{2}Z)
property: temperature
value: Temperature = (\d+\.\d+) C
property: humidity
value: Humidity = (\d+\.\d+) %rH
property: pressure
value: Pressure = (\d+\.\d+) Millibars
  
 When you have entered each of these properties, you should see something similar to the following: 
 
  
    
  
  
 Click the blue 
  APPLY  button to save the changes.  AttributesToJSON  
 From the NiFi menu, drag the 
  Processor  icon to the canvas. In the Filter box, enter  attributes . You should see something similar to the following: 
 
  
    
  
  
 Select the 
  AttributesToJSON  processor. Click on the blue  ADD  button to add the processor to the canvas.  
 Now we need to configure the 
  AttributesToJSON  processor. Right click on the processor and select the  Configure  menu option.  
 On the 
  PROPERTIES  tab of the  AttributesToJSON  processor, there are a few changes we need to make.  
 For the 
  Attributes List  property, we need to provide a comma-separated list of attributes we want the processor to pass on. Click inside the Value box next to  Attributes List . Enter the following value:  
  hostname,datetime,temperature,pressure,humidity   
 For the 
  Destination  property, set the value to  flowfile-content . We need the values to be in the flowfile content itself as JSON which is needed by the  PutSolrContentStream  processor. Otherwise the flowfile content will contain the raw data (not JSON) coming from the Raspberry Pi. This will cause Solr to throw errors because it is not able to parse request.  
 You should see something similar to the following: 
 
  
    
  
  
 Click the blue 
  APPLY  button to save the changes.  PutSolrContentStream  
 From the NiFi menu, drag the 
  Processor  icon to the canvas. In the Filter box, enter  solr . You should see something similar to the following: 
 
  
    
  
  
 Select the 
  PutSolrContentStream  processor. Click on the blue  ADD  button to add the processor to the canvas.  
 Now we need to configure the 
  PutSolrContentStream  processor. Right click on the processor and select the  Configure  menu option.  
 On the 
  SETTINGS  tab of the  PutSolrContentStream  processor, you should check the  connection_failure ,  failure , and  success  boxes under  Automatically Terminate Relationships . Since this is the end of the flow, we can terminate everything. You could expand on this by retrying failures, or logging errors to a text file.  
 You should see something similar to the following: 
 
  
    
  
  
 On the 
  PROPERTIES  tab of the  PutSolrContentStream  processor, we need to make a few changes.  
 Set the 
  Solr Type  property to  Standard . We don't need to run SolrCloud for our demo.  
 Set the 
  Solr Location  to  http://192.168.1.186:8983/solr/gettingstarted . You should use the IP address of your computer. When we start Solr up, we'll be using the  gettingstarted  collection, so it's part of the URL. If we were using SolrCloud, we put put the collection name in the  Collection  property instead.  
 The first set of properties should look similar to the following: 
 
  
    
  
  
 Now we need to add fields for indexing in Solr. Click the 
  +  icon in the upper right corner of the processor. The  Add Property  dialog will be displayed. For the first field, enter  f.1  and click the  ADD  button. For the value enter  hostname_s:/hostname . The  hostname_s  part of the value says to store the content in the Solr field called  hostname_s , which uses the dynamic schema to treat this field as a string. The  /hostname  part of the value says to pull the value from the root of the JSON where the JSON node is called  hostname .  
 We need to repeat this process for each of the other data elements coming from the Raspberry Pi. You should have the following fields defined as separate fields: 
 property: f.1
value: hostname_s:/hostname
property: f.2
value: timestamp_dts:/datetime
property: f.3
value: temperature_f:/temperature
property: f.4
value: pressure_f:/pressure
property: f.5
value: humidity_f:/humidity
  
  
    
  
  
 Click the blue 
  APPLY  button to save the changes.  Connector Processors  
 Now that we have our processors on the canvas, we need to connect them. Drag the connection icon from the 
  Input Port  processor to the  ExtractText  processor.  
 Drag the connection icon from the 
  ExtractText  processor to the  AttributesToJSON  processor.  
 Drag the connection icon from the 
  AttributesToJSON  processor to the  PutSolrContentStream  processor.  
 You should have something that looks similar to the following: 
 
  
    
  
  Create MiniFi flow  
 Now we can create our MiniFi flow.  ExecuteProcess  
 The first thing we need to do is add a processor to execute the bash script we created on the Raspberry Pi.  
 Drag the 
  Processor  icon to the canvas. Enter  execute  in the Filter box. You should see something similar to the following: 
 
  
    
  
  
 Select the 
  ExecuteProcess  processor. Click on the blue  ADD  button to add the processor to the canvas.  
 Now we need to configure the 
  ExecuteProcess  processor. Right click on the processor and select the  Configure  menu option.  
 On the 
  SETTINGS  tab you should check the  success  box under  Automatically Terminate Relationships . You should see something similar to the following: 
 
  
    
  
  
 On the 
  Scheduling  tab we want to set the  Run Schedule  to  5 sec . This will run the processor every 5 seconds. You should see something similar to the following; 
 
  
    
  
  
 On the 
  Properties  tab we want to set the  Command  to  /home/pi/get_environment.sh . This assumes you created the scripts in the  /home/pi  directory on the Raspberry Pi.  
 Click the blue 
  APPLY  button to save the changes.  Remote Process Group  
 Now we need to add a 
  Remote Process Group  to our canvas. This is how the MiniFi flow is able to send data to Nifi. Drag the  Remote Process Group  icon to the canvas.  
 For the 
  URL  enter the URL you use to access your NiFi UI. In my case that is  http://192.168.1.186:9090/nifi . Remember the default port for NiFi is  8080 . For the  Transport Protocol  select  HTTP . You can leave the other settings as defaults. You should see something similar to the following: 
 
  
    
  
  
 Click the blue 
  ADD  button to add the  Remote Process Group  to the canvas.  Create Connection  
 Now we need to create a connection between our ExecuteProcess processor and our Remote Process Group on the canvas.  
 Hover your mouse over the 
  ExecuteProcess  processor. Click on the circle arrow icon and drag from the processor to the  Remote Process Group .  Save Template  
 We need to save the MiniFi portion of the flow as a template. Select the 
  ExecuteProcess ,  Remote Process Group  and the connection between them using the shift key to allow multi-select.  
 Click on the 
  Create Template  icon (second icon from the right on the top row) in the Operate Box on the canvas. It looks like the following: 
 
  
    
  
  
 The 
  Create Template  dialog will be displayed. Give your template a name. I used  rasbperrypi  and click the blue  CREATE  button.  
 Now click on the main Nifi Menu button in the upper right corner of the UI. You should see something like the following: 
 
  
    
  
  
 Now click the 
  Templates  options. This will open the  NiFi Templates  dialog. You will see a list of templates you have created. You should see something similar to the following: 
 
  
    
  
  
 Now find the template you just created and click on the 
  Download  button on the right hand side. This will save a copy of the flowfile in xml format on your local computer.  Convert NiFi Flow to MiniFi Flow  
 We need to convert the xml flowfile NiFi generated into a yml file that MiniFi uses. We will be using the minifi-toolkit to do this.  
 We need run the minifi-toolkit transform command. The first option is the location of the NiFi flowfile you downloaded. The second option is the location where to write out the MiniFi flowfile. MiniFi expects the flowfile name to be 
  config.yml   
 Run the transform command. You should see something similar to the following: 
 $ /Users/myoung/Downloads/minifi-toolkit-0.1.0/bin/config.sh transform ~/Downloads/raspberry.xml ~/Downloads/config.yml
Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home
MiNiFi Toolkit home: /Users/myoung/Downloads/minifi-toolkit-0.1.0
No validation errors found in converted configuration.
  Copy MiniFi Flow to Raspberry Pi  
 Now we need to copy the flowfile to the Raspberry pi. You can easily do that using the 
  scp  command. The  config.yml  file we generated needs to go in the  /home/pi/minifi-1.0.2.1.1.0-2/conf/  directory.  
 You should see something similar to the following: 
 $ scp ~/Downloads/minifi.yml pi@raspberrypi:/home/pi/minifi-1.0.2.1.1.0-2/conf/config.yml
pi@raspberrypi's password:
minifi.yml                                                                                 100% 1962   721.1KB/s   00:00
  Start MiniFi  
 Now that the flowfile is in place, we can start MiniFi. You do that using the 
  minifi.sh  script with the  start  option. Remember that MiniFi will be running on the Raspberry Pi, not on your computer.  
 You should see something similar to the following: 
 $ /home/pi/minifi-1.0.2.1.1.0-2/minifi.sh start
minifi.sh: JAVA_HOME not set; results may vary
Bootstrap Classpath: /home/pi/minifi-1.0.2.1.1.0-2/conf:/home/pi/minifi-1.0.2.1.1.0-2/lib/bootstrap/*:/home/pi/minifi-1.0.2.1.1.0-2/lib/*
Java home:
MiNiFi home: /home/pi/minifi-1.0.2.1.1.0-2
Bootstrap Config File: /home/pi/minifi-1.0.2.1.1.0-2/conf/bootstrap.conf
  
 Now MiniFi should be running on your Raspberry Pi.  If you run into any issues, look at the logs in <minifi directory>/logs/minifi-app.log.  Start NiFi flow  
 Now that everything else is in place, we should be able to start our NiFi flow. Start the 4 NiFi processors, not the two MiniFi parts of the flow. If everything is working properly, you should start seeing records in Solr.  Dashboard  
 You can easily add Banana to Solr to create a dashboard. Here is an example:      Review  
 If you successfully followed along with this tutorial, you should have MiniFi collecting data from your Sense HAT sensor on your Raspberry Pi. The MiniFi flow should be sending that data to NiFi on your computer which then sends to the data to Solr. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		02-03-2017
	
		
		03:24 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Colin Cunningham
  I found that information on the Azure Marketplace page where you deploy the Sandbox.  We need to update the tutorial appropriately.
  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-06-2017
	
		
		07:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi everyone,  @jwhitmore   Thank's for your response, you are right, when exposing 50010, Talend For Big Data works (with component tHDFSConnect and cie)  But, even if we exposing the 50010 port, there are always the same error when using Talend ESB with Camel Framework, see below :  [WARN ]: org.apache.hadoop.hdfs.DFSClient - DataStreamer Exceptionorg.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/Ztest.csv.opened could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1641)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3198)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3122)  I've design a Scala program, and i'm facing the same issue :  15:59:22.386 [main] ERROR org.apache.hadoop.hdfs.DFSClient - Failed to close inode 500495org.apache.hadoop.ipc.RemoteException: File /user/hdfs/testscala2.txt could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1641)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3198)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3122)  Any idea ?  Thank's in advance.  Best regards,  Mickaël. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-03-2017
	
		
		06:38 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Neil Morin  Excellent, good to hear.  Please accept my answer to help others. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-01-2017
	
		
		11:16 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Its not related to HDP 2.5.0, I just encountered the same on 2.4.3. Was resolved for me after changing ambari.properties   agent.threadpool.size.max
client.threadpool.size.max
  The value should be mapped to actual CPU core count. Also, increased the heap size for namenode and datanode to 2 GB from the default 1GB value.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-29-2016
	
		
		02:45 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thank you so much @irfan aziz for the confirmation. I'm accepting answer given by @Michael Young. Please feel free to accept appropriate answer if required. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-22-2016
	
		
		02:47 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I am not sure about this error at this moment. But check it in https://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-19-2016
	
		
		01:25 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Back at it this morning and, while I don't quite get what's happening, I consider this resolved. This morning I did the following:   removed a prior install of Apache Zeppelin after I realized that after a reboot, it still responded to localhost:8080  confirmed it was indeed gone  started up Virtual box and started the sandbox  then zeppelin still responded to localhost:8080, which really confused me  then tried localhost:9995, to which a different zeppelin page responded - so that was a good thing  then, remembering something from a previous experience, I tried 127.0.0.1:8080 and then Ambari responded with its login page   This is now the second time I have seen localhost and 127.0.0.1 be treated differently; one of these days I'll have to figure out why. But for now, I'm back in business and continuing the tutorial.  Thanks everyone for their help!  Cecil 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













