Member since
04-11-2016
471
Posts
325
Kudos Received
118
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2130 | 03-09-2018 05:31 PM | |
2695 | 03-07-2018 09:45 AM | |
2594 | 03-07-2018 09:31 AM | |
4466 | 03-03-2018 01:37 PM | |
2511 | 10-17-2017 02:15 PM |
09-02-2016
06:15 PM
Could you share the information you will find in the application log file? (./logs/nifi-app.log)
... View more
09-01-2016
09:52 PM
Correct. As I said you can see what is generated by starting a processor to have flow file generated but not consumed by the next processor. Then list queue Then click on the Info button to have information displayed about the flow file: And you can even see the content of the flow file or download it. The GenerateFF only generates what we call core attributes such as UUI (to uniquely identify a flow file), filename, path, etc. Regarding the ReplaceText processors, this is not true, here are the configurations: ${now()}|17${now():toNumber():mod(9):toString()}.1.${now():toNumber():mod(25):toString()}.${now():toNumber():mod(255):toString()}|DE|${nextInt():mod(2):toString()} ${now()}|17${now():toNumber():mod(9):toString()}.1.${now():toNumber():mod(25):toString()}.${now():toNumber():mod(255):toString()}|ITA|${nextInt():mod(2):toString()} ${now()}|17${now():toNumber():mod(9):toString()}.1.${now():toNumber():mod(25):toString()}.${now():toNumber():mod(255):toString()}|USA|${nextInt():mod(2):toString()} ${now()}|17${now():toNumber():mod(9):toString()}.1.${now():toNumber():mod(25):toString()}.${now():toNumber():mod(255):toString()}|IND|${nextInt():mod(2):toString()} ${now()}|17${now():toNumber():mod(9):toString()}.1.${now():toNumber():mod(25):toString()}.${now():toNumber():mod(255):toString()}|FR|${nextInt():mod(2):toString()} For the purpose of the tutorial we want to generate random logs from different countries, hence the multiple processors.
... View more
09-01-2016
09:42 PM
3 Kudos
Hi, This is now available on the left panel. This is because of the new multitenancy and ACL improvements. Hope this helps.
... View more
09-01-2016
08:53 PM
Flow files are made of 'attributes' and 'content'. GenerateFF generates random flow files with content (or not if you don't want to). This is generally used to generate data to make start your flow but also mainly used for demonstration and test purpose. The ReplaceText processor only replaces content and is not modifying the attributes. Why five processors, simply to have generated the different part of the simulated logs you want to process. Just have a look at the configuration of each processor. You can also start a processor but not starting the next one in the flow. This will queue up flow files in the relationship. By right clicking on the relation, then lgoing to list, you will be able to see properties of each flow files as well as content. I'm sure this will help you understand the why and how.
... View more
09-01-2016
07:29 PM
3 Kudos
I'd recommend you starting bu reading the documentation about the philosophy behind NiFi as well as the documentation of each processor you are mentioning. This will explain you the concept of flow files, repository, flows, content vs attributes, etc. http://nifi.apache.org/docs.html
... View more
08-30-2016
12:24 PM
1 Kudo
OK, so I gave it a quick try. If some groovy gurus have feedbacks, don't hesitate. I have simulated the following DF (template split-execute-script.xml😞 GenerateFlowFile -> ReplaceText -> ExecuteScript -> PutFile GenerateFlowFile and ReplaceText are just used to generate flow files respecting your requirements. The ExecuteScript has the following body: import org.apache.commons.io.IOUtils
import java.nio.charset.*
def flowFile = session.get()
if (!flowFile) return
flowFile = session.write(flowFile, {inputStream, outputStream ->
inputStream.eachLine { line, count ->
def columns = line.split("\\|")
outputStream.write((columns[0] + "," + columns[1] + "," + columns[7] + "," + columns[8] + "\n").getBytes(StandardCharsets.UTF_8))
}
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)
It may exist a better version of this code but it does the job. I let you try it with your data just to confirm but I think this will fulfill your performance expectations. If dealing with huge files, you may want to first split your data into small chunks and then merge the data back in order to leverage data balancing (in cluster configuration) and multithreading. Let me know if you have any question.
... View more
08-30-2016
09:23 AM
2 Kudos
@boyer In this case, you can set the 'Max Bin Age' property so that after a given amount of time the merging process occurs even if the group size condition is not met. Hope this helps.
... View more
08-30-2016
09:19 AM
@sam coderunner Your comments are correct. I was not expecting such a performance degradation but you are probably right that 21 columns are not helping. I will try to perform some tests on my side to check if performances can be improved. But clearly, I do agree with you: in such a case I believe that ExecuteScript processor would be a better fit to solve the issue. It's really easy to write some lines of groovy (for example) to perform what you are looking for. Let me know if you need any help on this.
... View more
08-29-2016
03:37 PM
@Alvin Ji,
This is correct with NiFi 0.x. Unless you implement your own MapCacheServer service and separate it from NiFi, I am not sure there is a solution. With NiFi 1.x (first version to be released in coming days, RC vote in progress), this is solved with a zero-master clustering paradigm.
... View more