Member since
09-23-2015
42
Posts
91
Kudos Received
8
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
665 | 02-01-2016 08:56 PM | |
1552 | 01-16-2016 12:40 PM | |
3956 | 01-15-2016 01:14 PM | |
3813 | 01-14-2016 09:37 PM | |
5371 | 12-14-2015 01:02 PM |
03-15-2017
01:02 PM
Just to clarify https://issues.apache.org/jira/browse/NIFI-2613 currently only supports XSSF (.xlsx) based Excel documents. I do plan to add support for HSSF (.xls) in the near future.
... View more
09-14-2016
05:07 PM
6 Kudos
If I approached you on the street and challenged you to name a state capital building from a flashcard image how many capital buildings out of the 50 do you think you could recognize? That would be pretty difficult right? We humans tend to have a narrow scope of knowledge focused around our intimate interactions with day to day life. I would have no problem identifying Atlanta Georgia’s capital building for example because I live in Atlanta and am exposed to it almost daily. If presented with a image of Kansas's capital in Topeka however I would be stumped since I have never been there or even seen a picture of that capital building. The point is experience drives our ability to recognize images. Most millennials tend to also share these “experiences” to social media, blogs, text, etc. Google is dominate at crunching these digital “experiences” from its users and artfully marrying those impressions against validated datasets. Google Vision is a rest API hosted and managed by Google that allows users to upload arbitrary images and perform services like landmark detection, label annotations, OCR, image properties, explicit content detection, Face detection along with sentiment, and corporate logo detection with amazing accuracy. Obviously this opens up a wide range of next generation platform possibilities but how do we use it? Google Vision can be accessed via a myriad of language sdks but the focus of this article will be around Google Vision’s integration with Apache NiFi. I quizzed myself and shamefully was only able to recognize 4 of the 50 state capitals but how did Google and Apache NiFi do? Using Apache NiFi and Google Vision API I was able to successfully detect 35 of 50 state capitals from those same images! Don’t believe me? Lets take a look at how I did it. First up was the Google Vision API integration with Apache NiFi. Apache NiFi already has a robust set of tools for invoking REST APIs and handling JSON data. However I prefer my workflows to remain clean and concise. Although the discrete components are there I opted to create a custom GoogleVisionProcessor to condense those messy workflows into a single processor. The source code and instructions for using this processor can be found at https://github.com/jdye64/nifi-addons/tree/master/Processors/nifi-google-cloud. I also plan to contribute it to Apache in the coming weeks after I iron out some more advanced features. Lets take a look at the NiFi workflow and results from the experiment. As you can see the GoogleVisionProcessor properly detected 35 out of 50 state capitals! The processor takes the JSON detection definition returned by Google Vision and creates a handy Flowfile attribute that allows us to access that information using other Apache NiFi processors and do with it as we will. And just for reference here is a more graphical representation of the same image landmark detection from Google. I’m really excited about the new opportunities using the combination of Google’s advanced analytics with Apache NiFi’s agility will bring to end users. Much more information about Google Vision can be found at https://cloud.google.com/vision/ and https://cloud.google.com/blog/big-data/2016/09/around-the-world-landmark-detection-with-the-cloud-vision-api
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- How-ToTutorial
- NiFi
- nifi-processor
- ocr
- vision
Labels:
09-07-2016
04:12 PM
4 Kudos
Repo Description Apache MiNiFi is the next generation of data flow management and processing. MiNiFi allows data flow administrator to reach devices further out on the edge than previously possible. Given the lightweight nature of Apache MiNiFi certain elements have been stripped away to allow for smaller binary size when running on embedded devices. A UI designer happened to be one of the things that had to go. Apache MiNiFi relies on data flow administrators to instead develop the flows they would like to run on Apache NiFi and then convert those XML templates to the MiNiFi friendly YML format. That transformation requires downloading a utility and running it from the commandline. This approach is not always possible for administrators operating in limited environments. This utility aims to help with this issue by allowing administrators/users to upload an existing NiFi XML and then outputting the MiNiFi YML without the need for invoking the commandline toolkit. Repo Info Github Repo URL https://github.com/jdye64/minifi-toolkit-ui Github account name jdye64 Repo name minifi-toolkit-ui
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- minifi
- minifi-yml
- NiFi
- nifi-templates
- utilities
Labels:
09-07-2016
04:12 PM
3 Kudos
Repo Description This project aims to allow Apache NiFi CPP developers a much more convenient way of testing their code against several different OS and versions. Repo Info Github Repo URL https://github.com/jdye64/minifi-cpp-devenvs Github account name jdye64 Repo name minifi-cpp-devenvs
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- docker
- minifi
- utilities
Labels:
09-07-2016
02:06 PM
3 Kudos
Repo Description OpenCV is a wildly popular image processing library developed by Intel research in 1999. OpenCV provides the ability to manipulate images, image processing, and most importantly in this context object detection. This project aims to allow users to use Apache NiFi to harness incoming images and perform object detection on those objects based on trained models. The example shows face and eye detection however any OpenCV trained model can be used via the easy to configure JSON property that allows you to specify the desired model to be used. Repo Info Github Repo URL https://github.com/jdye64/nifi-opencv Github account name jdye64 Repo name nifi-opencv
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- NiFi
- nifi-processor
- opencv
- sample-aps
Labels:
07-21-2016
01:41 PM
3 Kudos
@Brandon Wilson I believe the best way to do this would be to install NiFi on your system using $NIFI_HOME/bin/nifi.sh install
and then using an OS level processing management tool (supervisord on Ubuntu for example) to monitor that process and then restart it based on configurations that you provide to the process management tool.
... View more
05-24-2016
02:56 AM
12 Kudos
Recently I decided it was time to give my lawn a makeover. Years of brutally hot Atlanta summers have taken their toll on my grass and its well … dead. I chose the do it yourself route and as usual went over budget and invested far more time than I had planned. None the less I now have a decent looking backyard. Given my investment and how much I travel I decided it would be worth the extra money to install an automated sprinkler system. There are several of these that exist in the market so I set out doing my research. I chose Rachio (http://rachio.com) for its ability to control watering based on weather and other conditions which is awesome! While Rachio is a great product with several features it had a few shortcomings that I was hoping to supplement with my existing home setup (powered by Apache NiFi of course). The idea was the use all of the existing features that Rachio offered and then use data from my local home automation setup to further supplement the watering system. There were two main features that Rachio didn’t offer that I wanted to add.
Dog (Zeke) Location - My wife and I have the world’s coolest dog (Zeke). He does have a few weaknesses however and water happens to be one of them. Since he spends much of his time in the fenced in backyard by himself I can’t risk them turning on while he is back there or he will enter hyper puppy play mode and dig them all up. Worse yet he will bring that mud back in the house with him once he is done. It is a must that my the system understand when he is outside and not allow watering to occur. Outdoor Gatherings - We use our backyard a lot and don’t want unexpected waterings while we have guests in the backyard. While Rachio allows you to manually control this with an app I wanted a more automated approach that understood when we were in the backyard without any manual process. After settling on the features that I wanted to my system I set out to solve the technical implementation and landed on the approaches listed below. Dog (Zeke) Location - This problem was a little tough to solve. I finally landed on installing an iBeacon (Gimbal Series 10) on Zeke’s collar and setting up a custom Raspberry PI BLE scanner that I had made for another project. This is out of the scope of this blog but at a high level the scanner sits at his only entrance/exit to the yard and toggles between him being either inside of outside. This is c++ and python application that uses Linux bluez. An instance of Apache NiFi is also running on this Raspberry PI and forwarding the JSON iBeacon payload to my NiFi master cluster for further analysis. Outdoor Gatherings - Similar to tracking Zeke with his iBeacon collar I have a separate wireless network in my backyard and uses a MikroTik RouterOS software to monitor for MAC addresses of friends and family’s mobile phones. The logic is if a known MAC address is connected to that network in the backyard then someone is back there and we should delay the sprinklers being turned on. Another instance of Apache NiFi is gathering output from RouterOS and sending that information the the main NiFi instance for further analysis. To recap I have three instances of Apache NiFi running. Two instances are gathering data from its point of inception and passing that data along to the third instance where the data is analyzed. This instance also sends requests to the Rachio API to turn off the watering system if a dog or human being is detected in the backyard. Lets take a look at the NiFi workflow of the third instance that ultimately controls the water system. The workflow was created with out of the box features and simple steps to follow. Clearly Apache NiFi is the cadillac of integrating with other awesome 3rd party systems!
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- How-ToTutorial
- IoT
- NiFi
- raspberry
Labels:
04-19-2016
11:41 PM
13 Kudos
I’m constantly amazed by what powerful things I can do with Apache NiFi in such few steps. I often challenge myself by saying “self, I bet you couldn’t do X with NiFi”. My confidence was challenged yesterday on a long flight back from Peru to Atlanta when I realized I couldn’t perform OCR type tasks with NiFi as it stands today. Perturbed by this fact I set out to come up with a solution. Ultimately this lead me to create a NiFi Tesseract processor for performing OCR tasks natively from within Apache NiFi. It wasn’t really until I was finished that I realized the how useful this processor could be. The Apache Tesseract Processor would give me the ability to read anything from hand written doctors notes from healthcare systems to interpreting scanned children’s book images.
In fact I chose to demonstrate the later by showing how to use Apache NiFi to perform OCR on an excerpt from Dr. Seuss's - "Cat in the Hat” and then feeding that resulting text from the NiFi Tesseract processor to the Mac OS X “say” command to read the output. I have included a screen recording session that shows the Apache NiFi reading in a page from Cat in the Hat and then reading the results. Screen Recording - Using Apache NiFi to read children's books Only 5 simple drag and drop processors for a computer to read a child’s book! Thanks Apache NiFi!
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- FAQ
- How-ToTutorial
- NiFi
- nifi-processor
- tts
Labels:
04-19-2016
09:40 PM
10 Kudos
Avro is a popular file format within the Big Data and streaming space. Avro has 3 important characteristics that make it a great fit for both Big Data and streaming applications.
Avro files are self describing. Meaning the Avro files can be opened and the schema definition viewed as standard JSON or inspected programmatically by numerous applications. This makes your application code much less brittle as the schema information can be obtained from the incoming Avro file itself rather than manually defining them in your code. Avro can handle a wide range of data type natively. This includes things like complex types, Maps, Arrays, and even raw bytes. Avro supports schema evolution which can come in very handy in streaming systems where the data flowing through the system can change without notice.
Now that all of the pros of Avro have been called out there is a problem. In reality we rarely encounter Avro files while ingesting/streaming data. Why is this? Mostly because storing data in the Avro format requires defining the Avro schema up front. Defining the Avro schema up front requires some planning and knowledge of the Avro file format itself not everyone has. This is a tragedy given the many benefits Avro provides to us. This was the driving factor for me creating the “InferAvroSchema” processor within Apache NiFi. InferAvroSchema exists to help endusers who either don’t have the time or the knowledge to create Avro files. InferAvroSchema exists to overcome the initial creation complexity issues with Avro and allows Apache NiFi users to quickly take more common flat data files, like CSV, and transform them into Avro.
So now that we have a little background lets get into the details about how we make this happen using Apache NiFi and InferAvroSchema. Before we start lets take a look at the end result to get the big picture. Our end result is a workflow that takes loads a CSV file holding Weather specific data and converts it to an Avro file. The Weather.csv file is loaded using GetFile and then examined by the InferAvroSchema processor to determine the appropriate Avro schema. After the Avro schema is generated the ConvertCSVToAvro processor uses that schema to convert the CSV weather data to an Avro file. The resulting Avro file is ultimately written back to a local file on the NiFi instance machine. While the CSV data certainly doesn’t have to come from a local file (could have been FTP, S3, HDFS, etc) that was the easiest to demonstrate here. There is a local CSV file on my Mac called “Weather.csv”. “Weather.csv” is loaded by the GetFile processor which places the complete contents of “Weather.csv” into a new NiFi FlowFile. Below is a snippet pf the contents of “Weather.csv” to provide context. As you can see the CSV data contains a couple of different weather data points for a certain zip code. We want to take this data from its original CSV format and convert it to an Avro file. So where do we start? First we want to use the “InferAvroSchema” processor to help us make our Avro schema definition without having the manually define it ourselves since we are pretending we have no idea how to make Avro schemas by hand. InferAvroSchema can examine the contents of CSV or JSON data and provide for us a recommended Avro schema definition based on the data that it encounters in the incoming FlowFile content. InferAvroSchema provides a lot of flexibility via configurations so lets go through those and what they mean now. The above screenshot shows the available properties for InferAvroSchema. At first look it can be a little daunting so lets breakdown what is happening here by stepping through each property.
Schema Output Destination - This property controls where the Avro Schema will be written to once it has been generated. Remember we are not converting the CSV to Avro with this processor we are only creating the Avro schema. The actual work of converting the CSV to Avro is done by another processor (ConvertCSVToAvro). ConvertCSVToAvro requires an Avro schema to perform its duties so it makes sense to place the resulting schema in a location that can be easily accessed by ConvertCSVToAvro when it is needed. That is why I have chose to output the Schema as an attribute on the FlowFile so that I can use the NiFi expression language from within the ConvertCSVToAvro processor as you will see later. Input Content Type - Lets the processor know what type of data is in the FlowFile content and that it should try and infer the Avro schema from. In our example here that is CSV but JSON is also valid. CSV Header Definition - Since an Avro Schema needs to know the names for each field it contains this provides us a mechanism to provide those. This value can also be loaded from the CSV header definition as well but I placed it here just to demonstrate. You will notice it is also present in the Weather.csv screenshot above and we handle that in the next property. Get CSV Header Definition From Data - Since we manually specified the CSV Header Definition we don’t want to get the Header Definition from the Weather.csv file itself. IF we had of choosen to not manually specifiy it we could have set this to true to pull that value from the Weather.csv file itself. CSV Header Line Skip Count - Since the Weather.csv file does in fact contain a header line but we chose not to use it and manually we need to make sure we skip that line so that it is not present in the Avro schema logic itself. This is why the value is set to 1 to skip that first line. CSV Escape String - Character used to escape strings. CSV Quote String - Character used for CSV data quote. Pretty Avro Output - Makes the results Avro output pretty formatted or not. Strictly for aesthetic purposes only. Avro Record Name - This value will be the name of the Avro record in the resulting Avro schema. You can set this value to whatever you desire. Of course it makes sense to name it something relative to the data so here I have called it “Weather" Numer of Records To Analyze - This is how many records the processor should analyze to determine the type (String, Long, etc) of the data present in the CSV data. 10 is the default and seems to be the sweet spot for accuracy and performance. Charset - The character encoding of the incoming FlowFile content. In this case the Weather.csv is UTF-8 encoded so that is what I have specified. At this point you will have a Avro schema that was automatically generated based on the raw incoming CSV data, congratulations! This doesn’t do us much good however. We still need to put that Avro schema to work and convert the original Weather.csv data into an Avro file. We do that in our next step with the ConvertCSVToAvro processor. The configuration for that processor is described below. The properties for ConvertCSVToAvro are a little more straightfoward so we aren’t going to go through them one by one. I do want to point out the value for Record Schema however. If you will notice it has a value of ${inferred.avro.schema}. If you recall in the InferAvroSchema processor above we told it to write the resulting Avro Schema to the FlowFile attribute. So now we are able to access that value using the NiFi expression language here. The name of the FlowFile property will always be inferred.avro.schema. At this point our Weather.csv data has successfully been converted to an Avro file and we can do whatever we desire with it. I chose to simply write the data back to another local file which will be named “Weather.csv.avro”. Here is a screenshot of that output.
... View more
- Find more articles tagged with:
- apache
- Avro
- csv
- Data Ingestion & Streaming
- How-ToTutorial
- NiFi
Labels:
04-19-2016
09:01 PM
Can you please post the JSON coming off of AttributesToJSON? Changing "Include Core Attributes" alone will not solve your problem.
... View more
04-19-2016
08:02 PM
Ahh I think I see what the problem is I think it is because you have "Include Core Attributes" set to true in AttributesToJSON and some extra fields are getting introduced into the JSON not present in the database table. Please paste that content I mentioned earlier however so I can validate.
... View more
04-19-2016
07:59 PM
Ok so the only way you should be seeing this is if JSON isn't in the format the ConvertJSONToSQL is expecting. The processor does a final Iterator<String> fieldNames = rootNode.getFieldNames(); and then performs a while loop on that Iterator incrementing a "fieldCount" variable each time. The only way you could see this is if the JSON isn't really what you think it is. I see the connection between "AttributesToJSON" and "ConvertJSONToSQL" has some FlowFiles in there. Can you right click that connection and list the contents and paste the exact contents of one of them here? Wondering if "AttributesToJSON" is doing something squirrely. I wrote it so its certainly possible ...
... View more
04-19-2016
07:46 PM
Your configuration looks valid to me. Can you post a screenshot showing your configuration for what is being written to the FlowFile contents and feed to the ConvertJSONToSQL processor? It also might help to validate that the JSON payload you expect is actually in the FlowFile's content by using a LogAttribute processor and setting the "Log Payload" Property to true right before going to the ConvertJSONToSQL processor.
... View more
04-19-2016
07:13 PM
ignore the comment about Phoenix. I see you are using MySQL by the "ENGINE=InnoDB" now.
... View more
04-19-2016
07:11 PM
1 Kudo
Chris can you validate that your DBCPConnectionPool controller is pointing the the appropriate database instance? The JSONToSQL processor will attempt a "describe" using the Connection Service and often this error is the result of that Connection Service not being pointed to the desired database. OR if you using using a Phoenix table be careful as the Phoenix JDBC driver is case sensitive and can make things a little more tricky.
... View more
04-06-2016
08:29 PM
3 Kudos
Since the number of salt buckets can only be set a table creation time this can be a little tricky. It takes a small amount of foresight in understanding your needs from the table AKA will the table be more read heavy or write heavy. A neutral stance would be to set the number of salt buckets to the number of Hbase RegionServers in your cluster. If you anticipate heavy write loads increasing that to something around {Hbase RegionServer Count * 1.20} which would increase the number of buckets by 20% and allow for a more distributed load. Increasing the salt buckets too high however may reduce your flexibility when you perform range based queries.
... View more
02-01-2016
08:56 PM
2 Kudos
Wes - I know you are asking for REST API in your question but it seems to me that it would be better suited to pull this information using from Flume's JMX MBeans. It sounds to me like you are looking for lower level metrics like memory used, cpu, etc.
... View more
01-16-2016
12:40 PM
2 Kudos
@Narendra Bidari clientId and groupId are not the same. ClientId is a user specified string value that is sent along with every message to help with tracing and debugging. On the other hand groupId is a unique identifier for a group of consumer processes. Since the Kafka read offset is stored in zookeeper for your groupId you don't start reading files from the beginning for that topic. This is why you are able to read the entire topic when you change the topic name because no previous offset has been stored hope this helps
... View more
01-15-2016
01:14 PM
2 Kudos
@Kausha Simpson ReplaceTextWithMapping works exactly like ReplaceText with the exception that the ReplaceText property "Replacement Value" is defined in an external file. Unfortunately the format of that file is not very well documented. I have attached an example of that file and a sample workflow but for a high level overview the mapping file format is newline delimited per mapping defined with a \t character separator the mapping key to the desired replacement value. Good luck and hope this information helps.
... View more
01-14-2016
09:37 PM
3 Kudos
Ashwin - You can start the spark shell by running /usr/hdp/current/spark-client/bin/spark-shell in the sandbox.
... View more
01-14-2016
05:14 PM
1 Kudo
I agree with @jpercivall and @mpayne ReplaceText is the best way to go. I created a quick workflow that you can reference. This was assuming the input of AABBBBCC as you suggested. You can change the GetFile path and PutFile path, and the regex in ReplaceText to test with your real data.fixedwidthexample.xml
... View more
12-14-2015
01:02 PM
Hey Divya, There are a couple of ways to do this. The main flow is this however.
Load/parse the data into dataframes. It seems like you have already done this but since you didn't pass along that snippet I'm just going to make something up. You did mention you were using the spark-csv package so the example is doing the same.
val sqlContext = new SQLContext(sc)
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true") // Use first line of all files as header
.option("inferSchema", "true") // Automatically infer data types
.load("cars.csv");
Write the dataframe data to the HDFS location where you plan to create the Hive external table or the directory for an existing Hive table. df.select("year", "model").write()
.format("com.databricks.spark.csv")
.option("header", "true")
.save("hdfs://hdfs_location/newcars.csv");
Create the external Hive table by creating a HiveSQLContext val hiveSQLContext = new org.apache.spark.sql.hive.HiveContext(sc)
//Several other options can be passed in here for other formats, partitions, etc
hiveSQLContext.sql("CREATE EXTERNAL TABLE cars(year INT, model STRING) STORED AS TEXTFILE LOCATION 'hdfs_location'");
Query the Hive table with whatever query you wish // Queries are expressed in HiveQL
hiveSQLContext.sql("SELECT * FROM cars").collect().foreach(println)
... View more
12-02-2015
03:28 PM
Hortonworks has a tutorial that shows how to configure Solr to store index files in HDFS. Since HDFS is already a fault tolerant file system, does it mean that with this approach we can keep the replication factor of 1 for any collections (shards) that we create? It sounds like a lot of redundancy if we keep the default HDFS replication factor of 3 plus Solr replication on top of that.
... View more
Labels:
- Labels:
-
Apache Solr
10-28-2015
12:17 AM
2 Kudos
Nice! I had made something similar a few months back so this might be my opportunity to share that as well https://github.com/jdye64/PhoenixRESTServer-Client
... View more
10-09-2015
12:53 PM
1 Kudo
It seems the issue is specific to Falcon UI. The Falcon UI attempts to validate the S3 URI and enforces that it ends with amazonaws.com however this is not the format expected by "Jets3tNativeFileSystemStore" which distcp ultimately invokes. The format needs to be s3n://BUCKET/PATH. This was causing the authentication to fail even with the proper credentials in place since the wrong endpoint was being hit. A work around is to download the xml from the Falcon UI and then edit the s3n URI manually and then re upload the xml file through the Falcon UI.
... View more
10-08-2015
07:23 PM
2 Kudos
Where do you define your AWS access key and secret key credentials for mirroring data from a local Falcon cluster to S3? I have the job setup but it is failing because those values are not defined.
... View more
Labels:
- Labels:
-
Apache Falcon
09-30-2015
01:23 PM
2 Kudos
You can do this with the Ambari REST API like so curl -u $USER:$PASS -i -H 'X-Requested-By: ambari' -X DELETE http://$AMBARI_HOST:$AMBARI_PORT/api/v1/clusters/<cluster-name>/services/<service-name>;
... View more
09-29-2015
01:42 PM
1 Kudo
@sshaw@hortonworks.com In this scenario your best bet is going to be to use MR-Streaming. MR-Streaming will read the data from the files in HDFS and present each InputRecord (I'm assuming TextInputFormat line delimited so each line of the file in that case) to your python script to execute. This is handy in your scenario because this keeps you from having to invoke python scripts from the native Java MR code. Here is a really simple example. You can adjust the 'map.py' file to contain any logic you desire or even use subprocess to call an existing python script it desired.
... View more
09-29-2015
01:21 PM
@MCarter@hortonworks.com the easiest way to compile this is going to be with Maven. The list of dependencies is rather lengthy and you will spend a good bit of time trying to include them all if you want to use javac. If you have never used Maven before then it probably isn't installed on your Mac either. The easiest way to install Maven is going to be with the Mac package manager "Homebrew". You might not have that installed either but its a simple one liner to install. More information on HomeBrew can be found here. It is commonly used among developers on OSX. ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" After HomeBrew (brew) is installed run this command to install Maven. brew install maven Once you have successfully installed Maven you can compile the project by changing to the directory where the pom.xml file is located and running. mvn clean install package Once that has completed you will see the resulting jar at ./target/simple-yarn-app-1.1.0.jar Hope that helps.
... View more
09-26-2015
12:59 PM
3 Kudos
@sraghavan@hortonworks.com You can handle 1 and 2 out of the box with NiFi using the steps listed below. RabbitMQ is not currently supported natively. 1) MS SQL - Create DBCPConnectionPool in NiFi by going to "Controller Settings" -> "+" -> "DBCPConnectionPool". Here you can define your connection string and point to the SQLServer JDBC jar. After you define the connection pool return to your flow and insert a "ExecuteSQL" Processor that references the DBCPConnectionPool you just created. From there it is pretty straightforward. Keep in mind the output from "ExectueSQL" is Avro so you might want to use a "ConvertAvroToJSON" Processor if you don't want to use Avro. At this point you would use a "PutHDFS" to place the data in the Hive appropriate directory in HDFS. 2) CouchDB - Exact same process as MS SQL except you would use the JDBC driver for CouchDB when setting up the DBCPConnectionPool. 3) NiFi (as of version 0.3.0) only supports ActiveMQ out of the box. Creating your own custom Processor in Java to use in NiFi is supported however. You could create a RabbitMQ Processor to use if this is really needed.
... View more