Member since
07-14-2017
99
Posts
5
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1446 | 09-05-2018 09:58 AM | |
1965 | 07-31-2018 12:59 PM | |
1440 | 01-15-2018 12:07 PM | |
1343 | 11-23-2017 04:19 PM |
01-16-2018
07:58 AM
Can you kindly let me know how can i 'merge' this pull request to our current instance so that we can access the processor?
... View more
10-30-2017
05:00 PM
1 Kudo
@Hadoop User
Use ListFile Processor and run that in cron schedule for every minute, this processor will store the state and wont return any warning if there is no new file. Then you can use FetchFile processor to pull the listed files from ListFile processor. As these processors won't delete the file from your directory once fetch has been done(like getfile processor), if you want to delete those files from directory then use ExecuteStreamCommand processor and write a shell script which can get the filename from to flowfile attribute and pass that attribute to your script. Flow:- 1.ListFile //list all the files from directory.
2.FetchFile //fetch the listed file.
3.ExecuteStreamCommand //shell script to delete file from directory. Refer to below link how to pass attributes to the ExecuteStreamCommand processor script. https://pierrevillard.com/2016/03/09/transform-data-with-apache-nifi/
... View more
10-31-2017
07:18 PM
1 Kudo
@Hadoop User, Merge content minimum group size depends on your input file size, In merge content processor Change Correlation Attribute Name property to filename //it will binned all the chunks of files that having same filename and merges them.
<strong>Minimum Number of Entries</strong> //this is minimum number of flowfiles to include in a bundle and needs to be at least equal to chunk of files that you are getting after split text processor. Maximum Number of Entries max number of flowfiles to include in bundle. <strong>Minimum Group Size minimum size of the bundle</strong>// this should be at least your file size, if not then some of your data will not be merged.
Max Bin Age The maximum age of a Bin that will trigger a Bin to be complete. i.e after those many minutes processor flushes out what ever the flowfiles are waiting before the processor. in above screenshot i am having Correlation attribute name property as filename that means all the chunks of files that are having same filename will be grouped as one. Processor waits for minimum 2 files to merge and max is 1000 files and check for min and max group size properties also. if your flow is satisfying these properties then merge content processor won't having any files waiting before merge content processor. if your flow is not met the configurations above then we need to use Max Bin Age property to flush out all the files that are waiting before the processor. as you can see in my conf i have given 1 minute so this processor will wait for 1 minute if it won't find any correlation attributes that will flushes out, in your case you need to define the value as per your requirements. For your reference Ex1:-lets consider your filesize is 100 mb, after split text we are having 1000 chunks of splits then your Merge content configurations will looks like Minimum Number of Entries 1 Maximum Number of Entries 1000 Minimum Group Size 100 MB //atleast equal to your file size.
case1:-if one flowfile having 100 mb size then maximum number of entries property ignored because min entries are 1 and min group size is 100 mb it satisfies min requirements then processor will merge that file. case2:-if 1000 flowfiles having 10 mb size then minimum group size property ignored because max entries are 1000 it satisfies max requirements then processor will merge those files. then the 1000 chunks are merged into 1 file. Ex2:-lets consider your filesize is 95 mb, after split text we are having 900 chunks of splits..The challange in this case is processor with above configuration will not merge 900 chunks because it hasn't reached the max group sixe i.e 100 MB but we are having 95 mb but still we need to merge that file in this case you need to use then your Merge content configurations will looks like Minimum Number of Entries 1 Maximum Number of Entries 1000 //equals to chunk of files Minimum Group Size 100 MB //atleast equal to your file size, if one flowfile having 100 mb size then maximum number of entries property ignored because min entries are 1 and min group size is 100 mb it satisfies min requirements then processor will merge that file. case1:-if one flowfile having 100 mb size then maximum number of entries property ignored because min entries are 1 and min group size is 100 mb it satisfies min requirements then processor will merge that file.
case2:-if 1000 flowfiles having 10 mb size then minimum group size property ignored because max entries are 1000 it satisfies max requirements then processor will merge those files. --same until here-- Max Bin Age 1 minute we need to add max bin age this property helps if the files are waiting before the processor after 1 minute it will flush out those files then merges them according to filename attribute correlation. By analyzing your get file,split text,replace text processors(size,count), you need to configure merge content processor.
... View more
10-13-2017
04:01 PM
@Hadoop User If merging FlowFiles and adding more concurrent tasks to your putHDFS processor help with your performance issue here, please take a moment to click "accept" on the above answer to close out this thread. Thank you, Matt
... View more
10-17-2017
02:19 PM
@Shawn Weeks I have found the solution. It is with the principal which is has permission validation. Thanks for your help
... View more
11-23-2017
04:19 PM
I have sorted it. closing the question.
... View more
08-03-2017
08:50 AM
@Michael Young Thanks for the suggestion, I started trying the approach. 1. I did gethdfs to get the file. 2. Splitted the file on lines (count=1) Here I got a doubt while extracting, if I am not wrong I need to extract each attribute using extract text processor. today I have 10 attributes, suppose I want to extend my attributes to 1000, then is the same approach to be followed? it become lenghty, isn't it? And the K:V are not comma saperated they are space saperated, also any value could have space in the middle of it. e.g: source="abc def ghi jkl" destination="abcdefabc" I am bit confused, please suggest me
... View more
08-02-2017
09:21 AM
@Wynner I have replaced RouteOnContent processor, but kept parameters same. Surprisingly, it works pretty fast(seconds). not sure why the old one was not working. Thanks for your extended support.
... View more
- « Previous
- Next »