About Shu_ashu

Shu_ashu · ‎05-20-2018

@Amira khalifa You are having timestamp column format as 2008:05:17 17:23:01(with colon delimiter). Use the ReplaceText processor with below configs Search Value (\d{4}:\d{2}:\d{2})(\s+)(\d{2}:\d{2}:\d{2}) Replacement Value '$1$2$3' Character Set UTF-8 Maximum Buffer Size 1 MB //change the value as per your flowfile size Replacement Strategy Regex Replace Evaluation Mode Entire text Input Flowfile: 2008:05:17 17:23:01 --other fields Outputflowfile: '2008:05:17 17:23:01' --other fields - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎05-20-2018

@Sajjad Sarwar Please refer to this link,i think you are also facing the similar kind of issue.

Shu_ashu · ‎05-19-2018

@Satish John To generate hive.ddl dynamically in NiFi we need to Use ConvertAvroToORC processor and this processor generates hive.ddl statement dynamically based on the contents of flowfile and adds as attribute to the flowfile. Use ReplaceText processor to change the flowfile content as hive.ddl statment then use PutHiveQL processor to execute hive.ddl statement, by following this method we are going to create hive tables dynamically. (or) you can write a script to generate schema dynamically based on the contents of flowfile. Case 2: Use ConvertExcelToCSV processor then use ConvertRecord processor to convert Csvdata to Avro. Flow: 1.ListFile //stores the state and lists file incrementally 2.FetchFile //fetches files 3.ConvertExcelTocsv //converts excel format data to csv 4.ConvertRecord //reader as csvreader and writer as avrosetwriter 5.ConvertAvroToORC 6.UpdateAttribute //to change the filename to unique i.e ${UUID()} 7.PutHDFS //HDFS directory location to store data 8.ReplaceText //use ${hive.ddl} value as replacement value and change replacement strategy as always replace 9.PutHiveQL //configure enable hive connection pool controller service Case 1: You don't need to use the 3rd processor from the above flow and use all the other processors. Case3: Use ConvertExcelTocsv processor and this processor converts each sheet in excel as new flowfile. If you are having same sheet names then use RouteOnAttribute Processor to detect and route all the filenames to one relationship, add new property in RouteOnAttribute processor by using NiFi expression language. Flow: Use the case2 flow and add RouteOnAttribute Processor to route after 3.ConvertExcelToCSV Case4: If all the same group csv files having atleast same name then use RouteOnAttribute Processor to detect all the same group filenames to one relationship. Example: i'm having 100 filenames starts with nifi_pull_1.csv,nifi_pull_2.csv,....nifi_pull_100.csv, to detect all the files in the same group. Add new property in RouteOnAttributeProcessor as nifi_pull_tables ${filename:startsWith("nifi_pull")} //check the filename and route all the filenames starts with nifi_pull to this relation. References: ConvertExcelToCSVProcessor https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-poi-nar/1.6.0/org.apache.nifi.processors.poi.ConvertExcelToCSVProcessor/ Convert Record processor https://community.hortonworks.com/articles/115311/convert-csv-to-json-avro-xml-using-convertrecord-p.html Create hive.ddl dynamically https://community.hortonworks.com/articles/108718/ingesting-rdbms-data-as-new-tables-arrive-automagi.html NiFi expression language https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html RouteOnAttribute https://community.hortonworks.com/questions/54811/redirecting-flow-based-on-certain-condition-nifi.html - If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎05-18-2018

@Saikrishna Tarapareddy if you want to replace the special characters in header line then look into the below flow. Flow: So we are splitting the file as line count 1 in SplitText processor. RouteOnAttribute Configs: non_header ${fragment.index:gt(1)} //fragment index 1 is the header line. Use non_header relationship to feed MergeContent processor. Feed the unmatched relationship to feed replace text processor, now unmatched relationship gets only the fragment.index = 1 flowfile i.e our header is in the flowfile content. Use Replace text processor: Now apply your logic to replace the special characters in the flowfile content. Then feed the success relationship to Merge Content processor. MergeContent processor Configs: in mergecontent processor use Defragment as MergeStrategy so this processor will wait for all the fragments then does the merge. Change the Delimiter Strategy to Text and Demarcator to shift+enter. By following this method we are going to replace only the header line content and wait for all fragments and merge the contents of flowfile. Reference flow.xml replace-header.xml - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎05-18-2018

@Saikrishna Tarapareddy If you are willing to add user defined header without replacing the special chars from header line then Use ExecuteStreamCommand processor with the below configs in this processor we are routing all the lines except the first line i.e we are having the flowfile without header, then use ReplaceText processor with Prepend as replacement Strategy to add your user defined header to the file. Search Value (?s)(^.*$) Replacement Value <user-defined-header> Character Set UTF-8 Maximum Buffer Size 1 MB //change the value as per your flowfile size Replacement Strategy Prepend Evaluation Mode Entire text By using this method we are not going to have the header line from the file then we are adding the header to the flowfile content by using Replace Text processor. (or) instead of using ExecuteStreamCommand processor Use Record Oriented processors(like ConvertRecord) also we can do achieve this case. Configure/enable csvreader/csvsetwriter as controller services to read the flowfile content and change the Include Header Line value to false in csv setwrtier controller service. Then use ReplaceText processor to prepend the header by using this method also we need to define header in the replace text processor. https://community.hortonworks.com/questions/183313/how-to-change-csv-attributeheader-name-in-apache-n.html

Shu_ashu · ‎05-18-2018

@Shantanu kumar Please refer to this link for actual implementation of LazyOutputformat.

Shu_ashu · ‎05-18-2018

@Harshali Patel Lazy output format won't create any empty files in HDFS directories i.e when we are running MapReduce job, output files(part-nnnn) files are created by the reducer and the output from reducer will be zero, 1 or more records. If there are no record/s for the specific partition and you are using Lazy Output format in driver class then we are not going to create any empty files in HDFS directories. If we won't use Lazy Output format then all the empty files are going to created in HDFS directories,while reading the data from the directories which will cause performance impact on the jobs. As a bottom line if you want to suppress the empty files creation and the output files are created only when first record is generated by the partition. - If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎05-17-2018

@K Henrie If the PDF file path don't change then you can use UpdateAttribute processor after EvaluateJsonPath add new properties in update attribute processor. Then use Fetch File processor to fetch pdf file for every receipt then use PutEmail processor to send the attached pdf file and include all the attributes. Flow: --other processors--1.EvaluateJsonPath Processor //extract values as attributes2.UpdateAttribute Processor //add directory,filename attributes to the flowfiles3.FetchFile Processor //to fetch the PDF file now we are overwriting the contents of flowfile,attributes will be same there4.PutEmail By using this way we are fetching the pdf file for every email receipt. (or) Method2: Keep your pdf file into Distribute Cache map server using PutDistributedMapCache processor with some id and configure/enable DistributedMapCacheClientService,DistributedMapCacheServer controller services. then use the same DistributedMapCacheClientService Controller server in FetchDistributedMapCache processor with the same id that specified while keeping the pdf file in PutDistributeMapCache to fetch the file from the Distributecache server. By following this way we are not fetching the file from local directory instead we are loading into NiFi and pulling from the DistributedCache. Flow1: Put pdf file into DistributedMapCache: 1.GetFile //get the file from directory (or) you can use Listfile/FetchFile processors also 2.UpdateAttribute //add the Cache Entry Identifier key and value 3.PutDistribuedMapCache //put the pdf file into DistributedMapCache Flow2: Actual Flow to send mail with pdf file attached: --other processors--1.EvaluateJsonPath 2.UpdateAttribute //add the Cache Entry Identifier key and value3.FetchDistributedCacheMap //to fetch the pdf file4.PutEmail - If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎05-16-2018

@Manuel Carnerero As you are extracting a property that is necessary to build a query along with that Extract all the required content from the input JSON and keep as attributes to the flowfile. Make your Join with the extracted property once you get results back now the flowfile content is changed but the extracted attributes will not be lost. So extract all the required information from the join result content also and keep them as attributes to the flowfile. Now you are having all the required values as attributes to the flowfile ,Use AttributesToJSON processor mention all your attribute names in this processor Based on the attribute values processor creates new json message. Flow: AttributesToJSON processor configs: inputjson_attr1,inputjson_attr2 are extracted from first EvaluateJsonPath processor, joinjson_attr1,joinjson_attr2 are extracted from second EvaluateJsonPath processor and now we are merging both results into one json message. (or) Method2 Using Replace Text Processor: After ConvertAvroToJSON processor we can also use ReplaceText processor to search for } (or) }](if the wrap single record is set to true) and replace with all the attributes that are extracted from the input Json message. Flow: ReplaceText Configs: earch Value }$ Replacement Value ,"inputjson_attr1":"${inputjson_attr1}","inputjson_attr2":"${inputjson_attr2}" } Maximum Buffer Size 1 MB //change the size according to your feeding input flowfile size Replacement Strategy Regex Replace Evaluation Mode Entire text we are preparing key/value pairs in replace text processor by searching end of json message and replacing that with inputjson attributes adding the close curly braces at the end. You can select either of these ways as per your requirements i would suggest to go with using AttributesToJson processor to prepare the new merged json message instead of using ReplaceText processor. - If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎05-16-2018

@Ya ko Yes,it's possible with record oriented processors(Convert Record) NiFi, Flow: 1. Consume Kafka Processor //to read data from Kafka 2. Convert Record processor //to convert json format to csv format 3. PutHDFS //write the csv format data to HDFS In ConvertRecord processor we are configuring Record Reader(Json Tree Reader) to read incoming Json format data and Record writer Controller service(CSVSetWriter) to write the output flowfile in CSV format, we need to define Avro schema for the Record Reader/Writer controller services. Follow this tutorial to know more details and configurations regarding Convert record processor.

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: Error with replaceText timestamp add a quota

Re: Nifi putHDFS throwing replication error for si...

Re: Infer Schema and Create Table In Hive from NIF...

Re: NiFi Updating header

Re: NiFi Updating header

Re: What is LazyOutputFormat in Hadoop?

Re: What is LazyOutputFormat in Hadoop?

Re: NiFi - is it possible to send an email with at...

Re: Merge JSON with DB results

Re: Read from Kafka, convert data from JSON to CSV...