Member since
06-08-2017
1049
Posts
518
Kudos Received
312
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 11248 | 04-15-2020 05:01 PM | |
| 7158 | 10-15-2019 08:12 PM | |
| 3135 | 10-12-2019 08:29 PM | |
| 11575 | 09-21-2019 10:04 AM | |
| 4368 | 09-19-2019 07:11 AM |
07-04-2018
08:41 PM
@Vengai Magan Please refer to this and this links describes how to install NiFi as Service and this link to setup high performance NiFi.
... View more
07-03-2018
11:35 AM
@Raju Chigicherla Instead of splitting the file in one SplitText processor try with series of SplitText/SplitContent processors to split the 10GB file. (or) Use record oriented processors like SplitRecord and configure the processor to records per split that gives 50MB files, if you are still having issues with SplitRecord processor then use series of SplitRecord processors to get 50MB files. In addition to split xml files NiFi 1.7 introduced XmlReader/Writer controller services by using them we can split xml data in split record processor. Refer to this and this links to split big file by using series of Split processors.
... View more
07-03-2018
11:27 AM
@Vengai Magan To parse Fixed width file you can use Replace Text processor and keep the matching regex that extracts each field into a capture group then use some delimiter while replacing the data as shown above if you know how many words that each field is going to be then capture each field into one capture group then replace the content with some delimiter. In addition if your file having space delimiter then use this (.*)\s(.*) regex and replace with some delimiter. Change the ReplaceText evolution mode to line-by-line, Now we are reading fixed width file then replacing the contents of flowfile with some delimiter. Once you have delimiter on field then you can use Convert Record processor to read and write the data in your required format. If the file is couple of gigs then it's better to split the file to small chunks before ReplaceText processor then feed the splitted file to Replace Text processor. In addition there is scripted reader/writer controller service in ConvertRecord processor which allows to read the incoming flowfile by using the script that you have given and writes the flowfile contents as per your ScriptedWriter controller service configured. Some references regarding parsing fixed width file are here and here, References regarding splitting big csv file into smaller chunks are here and here
... View more
07-02-2018
10:24 AM
@Souveek Ray Use regexp_extract function as you are doing split on space, write an matching regex to extract until first space as first word and next capture group excludes the first word. Example: I'm having firstword secondword thirdword as a value hive> select regexp_extract("firstword secondword thirdword","^(.*?)\\s(.*)",1)first_word, regexp_extract("firstword secondword thirdword","^(.*?)\\s(.*)",2) not_first_word;
+-------------+-----------------------+--+
| first_word | not_first_word |
+-------------+-----------------------+--+
| firstword | secondword thirdword |
+-------------+-----------------------+--+ So in the above example we have extracted the first word by using regexp_extract and 1 capture group and in the next we have used regexp_extract and 2 capture group(excluding firstword).
... View more
06-29-2018
09:29 AM
@Murat Menteşe As your xml doc having array [] in it and i'm not sure how to write matching xslt. As the current xslt converts the array xml into object/element and adding "" for array[]. In case of large data you have to increase the Maximum Buffer Size 1 MB //increase based on your flowfile size as this processor works takes the whole flowfile into memory and does all replacing based on our configs.
... View more
06-29-2018
09:18 AM
@Amira khalifa Use one of the way from the above shared link to take out only the header from the csv file then in replace text keep the then search for (&|\(|\)|\/_|\s) and in Replacement value keep as empty string, now we are searching for all the special characters in the header flowfile then replacing with empty string.Now add this header flowfile with the other non header flowfile. all the explanation and template.xml are shared in this link.
... View more
06-27-2018
09:40 PM
1 Kudo
@Abhilash Chandrasekharan Try with the below syntax hive> select regexp_replace('HA^G^FER$JY',"\\^","\\$");
+---------------+--+
| _c0 |
+---------------+--+
| HA$G$FER$JY |
+---------------+--+ As ^,$ are reserved keys for Regex so use two back slashes in regexp_replace function.
... View more
06-27-2018
12:03 PM
1 Kudo
@Vladislav Shcherbakov Before ReplaceText processor use EvaluateJsonPath processor to extract the json values, keep as flowfile attributes. Add all your properties(case sensitive) in this processor and keep the destination as flowfile-attribute then feed the success relationship from EvaluateJsonpath to Replace text processor. Flow: --- --- other processors
3.SplitJson 5.EvaluateJsonPath
6.ReplaceText
... View more
06-27-2018
11:57 AM
1 Kudo
@Vladislav Shcherbakov You can change the schedule of ExecuteSql Processor Right Click --> Configure --> Scheduling tab Executesql either trigger based on Timer driven (or) cron driven, as shown in the above RunSchedule 1111111110 sec means the processor trigger once 1111111110 sec. you can use like 1 min,1 hr instead of specifying in sec. Cron driven use this link to make your cron expression then change the scheduling strategy to cron driven and keep your cron expression in Run Schedule. ExecuteSql processor doesn't store the state if you want to incrementally pull the data then use either incremental fetch using QueryDatabaseTable (or) Generate Table Fetch processors as these processors stores the state based on max value column provided and pulls only the changes made after the store state.
... View more
06-27-2018
04:02 AM
1 Kudo
@Murat Menteşe Could you share your sample xml file data so that i can reproduce the issue on my side by converting to json format?
... View more