Support Questions

Find answers, ask questions, and share your expertise

understanding the NIFI example project

avatar
Super Collaborator

The example now at

http://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-server-log-data/

is working fully for me ,but the hortonworks site is not explaining anything on how the things are working, is there such a document available ? if not can some one please help me understand how these components work ? e.g LogGenerator , Generator FlowFile , Replace Text, Aggregate data, Send Info, Send Error , etc etc.

I have a similar task to do of parsing a log file but without understanding how this example works I wont be able to do anything.

1 ACCEPTED SOLUTION

avatar

I'd recommend you starting bu reading the documentation about the philosophy behind NiFi as well as the documentation of each processor you are mentioning. This will explain you the concept of flow files, repository, flows, content vs attributes, etc.

http://nifi.apache.org/docs.html

View solution in original post

9 REPLIES 9

avatar
Super Collaborator

for example if I go inside LogGenerator I see 5 processors "GenerateFlowFile" , each one with exact same settings . why do we have these 5? with each one a "ReplaceText" processor is attached.

avatar
Super Collaborator

avatar

I'd recommend you starting bu reading the documentation about the philosophy behind NiFi as well as the documentation of each processor you are mentioning. This will explain you the concept of flow files, repository, flows, content vs attributes, etc.

http://nifi.apache.org/docs.html

avatar
Super Collaborator

hi Pierre

I am reading these documents but I am still not understanding why use 5 identical processors

avatar
Super Collaborator

GenerateFlowFile processor (document is saying) generates files with random data, but no where in the processor configuration are the fields mentioned , so where do we tell or how do we know what fields this file has ?

since the ReplaceText text connector is connected to GenerateFlowFile and has to know which fields to modify right?

avatar

Flow files are made of 'attributes' and 'content'. GenerateFF generates random flow files with content (or not if you don't want to). This is generally used to generate data to make start your flow but also mainly used for demonstration and test purpose. The ReplaceText processor only replaces content and is not modifying the attributes. Why five processors, simply to have generated the different part of the simulated logs you want to process. Just have a look at the configuration of each processor.

You can also start a processor but not starting the next one in the flow. This will queue up flow files in the relationship. By right clicking on the relation, then lgoing to list, you will be able to see properties of each flow files as well as content. I'm sure this will help you understand the why and how.

avatar
Super Collaborator

hi Pierre

attributes as I understand are key value pairs right ? so what keys does the GenerateFF generate?

I looked at the configuration of the five processors and they look exactly same .whats the difference?

avatar

Correct.

As I said you can see what is generated by starting a processor to have flow file generated but not consumed by the next processor.

7275-screen-shot-2016-09-01-at-114618-pm.png

Then list queue

7276-screen-shot-2016-09-01-at-114628-pm.png

Then click on the Info button to have information displayed about the flow file:

7277-screen-shot-2016-09-01-at-114635-pm.png

And you can even see the content of the flow file or download it.

The GenerateFF only generates what we call core attributes such as UUI (to uniquely identify a flow file), filename, path, etc.

Regarding the ReplaceText processors, this is not true, here are the configurations:

${now()}|17${now():toNumber():mod(9):toString()}.1.${now():toNumber():mod(25):toString()}.${now():toNumber():mod(255):toString()}|DE|${nextInt():mod(2):toString()}
${now()}|17${now():toNumber():mod(9):toString()}.1.${now():toNumber():mod(25):toString()}.${now():toNumber():mod(255):toString()}|ITA|${nextInt():mod(2):toString()}
${now()}|17${now():toNumber():mod(9):toString()}.1.${now():toNumber():mod(25):toString()}.${now():toNumber():mod(255):toString()}|USA|${nextInt():mod(2):toString()}
${now()}|17${now():toNumber():mod(9):toString()}.1.${now():toNumber():mod(25):toString()}.${now():toNumber():mod(255):toString()}|IND|${nextInt():mod(2):toString()}
${now()}|17${now():toNumber():mod(9):toString()}.1.${now():toNumber():mod(25):toString()}.${now():toNumber():mod(255):toString()}|FR|${nextInt():mod(2):toString()}

For the purpose of the tutorial we want to generate random logs from different countries, hence the multiple processors.

avatar
Super Collaborator

Thanks Pierre now its beginning to make some sense , so the 5 GenerateFF processors are there to take care of the 5 countries I guess.

I want to read my own log file , which processor would I use? I want to start with a simple task as read my log file , parse out some values by using the Regexp language and then save the parsed values to HIVE.