About Shu_ashu

Shu_ashu · ‎07-04-2018

@Vengai Magan Please refer to this and this links describes how to install NiFi as Service and this link to setup high performance NiFi.

Shu_ashu · ‎07-03-2018

@Raju Chigicherla Instead of splitting the file in one SplitText processor try with series of SplitText/SplitContent processors to split the 10GB file. (or) Use record oriented processors like SplitRecord and configure the processor to records per split that gives 50MB files, if you are still having issues with SplitRecord processor then use series of SplitRecord processors to get 50MB files. In addition to split xml files NiFi 1.7 introduced XmlReader/Writer controller services by using them we can split xml data in split record processor. Refer to this and this links to split big file by using series of Split processors.

Shu_ashu · ‎07-03-2018

@Vengai Magan To parse Fixed width file you can use Replace Text processor and keep the matching regex that extracts each field into a capture group then use some delimiter while replacing the data as shown above if you know how many words that each field is going to be then capture each field into one capture group then replace the content with some delimiter. In addition if your file having space delimiter then use this (.*)\s(.*) regex and replace with some delimiter. Change the ReplaceText evolution mode to line-by-line, Now we are reading fixed width file then replacing the contents of flowfile with some delimiter. Once you have delimiter on field then you can use Convert Record processor to read and write the data in your required format. If the file is couple of gigs then it's better to split the file to small chunks before ReplaceText processor then feed the splitted file to Replace Text processor. In addition there is scripted reader/writer controller service in ConvertRecord processor which allows to read the incoming flowfile by using the script that you have given and writes the flowfile contents as per your ScriptedWriter controller service configured. Some references regarding parsing fixed width file are here and here, References regarding splitting big csv file into smaller chunks are here and here

Shu_ashu · ‎07-02-2018

@Souveek Ray Use regexp_extract function as you are doing split on space, write an matching regex to extract until first space as first word and next capture group excludes the first word. Example: I'm having firstword secondword thirdword as a value hive> select regexp_extract("firstword secondword thirdword","^(.*?)\\s(.*)",1)first_word, regexp_extract("firstword secondword thirdword","^(.*?)\\s(.*)",2) not_first_word; +-------------+-----------------------+--+ | first_word | not_first_word | +-------------+-----------------------+--+ | firstword | secondword thirdword | +-------------+-----------------------+--+ So in the above example we have extracted the first word by using regexp_extract and 1 capture group and in the next we have used regexp_extract and 2 capture group(excluding firstword).

Shu_ashu · ‎06-29-2018

@Murat Menteşe As your xml doc having array [] in it and i'm not sure how to write matching xslt. As the current xslt converts the array xml into object/element and adding "" for array[]. In case of large data you have to increase the Maximum Buffer Size 1 MB //increase based on your flowfile size as this processor works takes the whole flowfile into memory and does all replacing based on our configs.

Shu_ashu · ‎06-29-2018

@Amira khalifa Use one of the way from the above shared link to take out only the header from the csv file then in replace text keep the then search for (&|$|$|\/_|\s) and in Replacement value keep as empty string, now we are searching for all the special characters in the header flowfile then replacing with empty string.Now add this header flowfile with the other non header flowfile. all the explanation and template.xml are shared in this link.

Shu_ashu · ‎06-27-2018

@Abhilash Chandrasekharan Try with the below syntax hive> select regexp_replace('HA^G^FER$JY',"\\^","\\$"); +---------------+--+ | _c0 | +---------------+--+ | HA$G$FER$JY | +---------------+--+ As ^,$ are reserved keys for Regex so use two back slashes in regexp_replace function.

Shu_ashu · ‎06-27-2018

@Vladislav Shcherbakov Before ReplaceText processor use EvaluateJsonPath processor to extract the json values, keep as flowfile attributes. Add all your properties(case sensitive) in this processor and keep the destination as flowfile-attribute then feed the success relationship from EvaluateJsonpath to Replace text processor. Flow: --- --- other processors 3.SplitJson 5.EvaluateJsonPath 6.ReplaceText

Shu_ashu · ‎06-27-2018

@Vladislav Shcherbakov You can change the schedule of ExecuteSql Processor Right Click --> Configure --> Scheduling tab Executesql either trigger based on Timer driven (or) cron driven, as shown in the above RunSchedule 1111111110 sec means the processor trigger once 1111111110 sec. you can use like 1 min,1 hr instead of specifying in sec. Cron driven use this link to make your cron expression then change the scheduling strategy to cron driven and keep your cron expression in Run Schedule. ExecuteSql processor doesn't store the state if you want to incrementally pull the data then use either incremental fetch using QueryDatabaseTable (or) Generate Table Fetch processors as these processors stores the state based on max value column provided and pulls only the changes made after the store state.

Shu_ashu · ‎06-27-2018

@Murat Menteşe Could you share your sample xml file data so that i can reproduce the issue on my side by converting to json format?

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: Best way to parse Fixed width file using Nifi....

Re: Split 10GB size of XML/CSV file into multiple ...

Re: Best way to parse Fixed width file using Nifi....

Re: Need to extract words from a column containing...

Re: nifi - incorrect json format correction

Re: ReplaceText Processor edit only the header of ...

Re: How to use regexp_replace in hive to remove sp...

Re: ReplaceText Processor

Re: ExecuteSQL reads data from database an infinit...

Re: nifi - incorrect json format correction